US20120131049A1 - Search Tools and Techniques - Google Patents

Search Tools and Techniques Download PDF

Info

Publication number
US20120131049A1
US20120131049A1 US13/362,591 US201213362591A US2012131049A1 US 20120131049 A1 US20120131049 A1 US 20120131049A1 US 201213362591 A US201213362591 A US 201213362591A US 2012131049 A1 US2012131049 A1 US 2012131049A1
Authority
US
United States
Prior art keywords
search
productive
prior
identifying
primary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/362,591
Inventor
John W. Ogilvie
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Resource Consortium Ltd
Original Assignee
Resource Consortium Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Resource Consortium Ltd filed Critical Resource Consortium Ltd
Priority to US13/362,591 priority Critical patent/US20120131049A1/en
Assigned to RESOURCE CONSORTIUM LIMITED reassignment RESOURCE CONSORTIUM LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: OGILVIE, JOHN W.
Publication of US20120131049A1 publication Critical patent/US20120131049A1/en
Assigned to RESOURCE CONSORTIUM LIMITED, LLC reassignment RESOURCE CONSORTIUM LIMITED, LLC RE-DOMESTICATION AND ENTITY CONVERSION Assignors: RESOURCE CONSORTIUM LIMITED
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • the present invention relates generally to tools and techniques for searching large collections of information, such as databases and/or the Internet.
  • Well-known search tools include keyword-search interfaces such as those providing access to the United States Patent and Trademark database of patents and patent applications at the USPTO web site; those provided by search engine sites such as the Google site, the Yahoo! Site, and others; and variations on these such as those using a voice interface.
  • keyword-search interfaces such as those providing access to the United States Patent and Trademark database of patents and patent applications at the USPTO web site; those provided by search engine sites such as the Google site, the Yahoo! Site, and others; and variations on these such as those using a voice interface.
  • the present invention provides search tools and techniques, which may be embodied in various forms such as methods, systems, products of processes, configured storage media, computer data structures, and the like.
  • discussion of one form of embodiment illustrates but does not necessarily limit other forms of embodiment.
  • discussion of methods of the invention illustrate systems of the invention, which may include computers configured to operate according to the methods, without necessarily requiring that the systems include every limitation discussed in connection with the methods.
  • discussion of systems illustrates methods without necessarily limiting the methods, and so on, for each form of embodiment. For more information, please refer to the claims.
  • FIG. 1 is a data flow diagram illustrating systems and methods according to the present invention.
  • FIG. 2 is a flowchart illustrating methods (a.k.a. processes) of the invention.
  • search subwebs defined by particular criteria instead of searching all available items (“items” are data sources, e.g., web pages, documents, images, etc.).
  • search subweb(s) may be defined by any one or more of the following criteria:
  • a claim set directed to “Generally corresponding embodiments including doing act one and doing act two” includes the following claims: “A method including the steps of doing act one and doing act two”, “A system including at least one device having a processor and a memory, the device(s) configured to perform act one and act two, solely and/or working together”, “A computer-readable storage medium configured with data and instructions to cause at least one device having a processor and a memory to perform act one and act two”, “A method comprising the steps of storing data defining at least a portion of a web site and providing interactive access over a network to the data, wherein the web site and the access permit performance of act one and act two”, and to the extent they make sense, “A data structure having a first field for holding data for performing act one, and having a second field for holding data for performing act two”, and “A product produced by a process including act one and act two”.
  • a method performed by a searching service corresponds to a method performed by a user of that searching service (as when the service sends/receives information that the user receives/sends), so the generally corresponding embodiments include the methods performed by each actor—the method performed by the user and the corresponding method performed by the searching service.
  • One group of generally corresponding embodiments includes searching the Internet and/or another data repository according to at least one of the following criteria:
  • Google and other search engines rely heavily on analysis of links, metatext, and/or other information in the documents that are being searched. But in trying to make searches more effective and more efficient, one can only extract so much information from the searched documents. At some point it becomes beneficial to identify and use information from people's use of the documents. By this I mean not merely their use as evident in placing links to/from the documents, but the other uses people make of search results, such as: identifying, reviewing, returning to, printing, and/or following links from, a given document.
  • some embodiments of the present invention assist a user 102 by leveraging for that user the search expertise of other users 112 .
  • This can be done using heuristics or other methods to determine 210 which searches 116 and/or search results 118 of other users 112 are most likely to help the present user 102 , and by focusing 212 the present user's search(es) 106 accordingly.
  • “focusing” means narrowing, refocusing, or both. That is, “focusing” does not imply simply adding limitations; in keyword search terminology it may involve adding OR as well as adding AND, or changing keywords themselves, or both.
  • “User 112 ” is used broadly here to mean any one or more persons and/or processes other than user 102 .
  • “Computer” (as in “computer 104 ”, “computer 114 ”, a.k.a. “computer device”) is used very broadly here to include any device for electronically accessing a database or other large collection, e.g., workstation, laptop, PDA, cell phone, other phone, kiosk, etc.
  • the search 118 produces some result 118 (which may be “no result found matching your search criteria”), which are then normally at least partially displayed (on a screen, through audio production, and/or otherwise) to the user 112 . This particular set of activities is familiar.
  • a search assistance component 120 monitors 202 , 206 and/or has access 204 to at least some of these activities and/or results of searchers 112 .
  • a search assistance tool 120 leverages those activities to assist user 102 .
  • the search assistance component 120 only monitors 202 and/or accesses 204 searches 116 and their search results 118 . In other embodiments, it also monitors/accesses 206 actions by users 112 performed on the computers 114 , e.g., it tracks scrolling, following links, opening/closing browsers or other windows, and so on. Such actions by users 112 may imply stronger interest in one search result 118 than in another, which reflection of interest can in turn be used by the search assistant 120 to focus 212 the searches 106 (keywords and/or searched portions) of the user 102 .
  • the search assistant 120 may be implemented in software, configuring and residing on internet/database servers 122 and/or distributed across computers 114 , for instance. Special-purpose hardware may also be used in implementing the search assistance module 120 . Any combination of hardware and software providing the claimed functionality may be used.
  • the user 102 formulates an initial keyword search 106 and sends it through computer 104 to the interact 108 .
  • the search assistant 120 notes 208 the initial search 106 , and then finds 210 related productive searches 116 by other users 112 .
  • related and “productive” each have a special meaning explained below.
  • the results 118 of those related productive searches and/or the related productive searches 116 themselves, are then used by the search assistant to focus 212 (possibly iteratively) the search effort of user 102 .
  • Some of the criteria for identifying 210 productive searches by others 112 include, for instance, identifying searches by other users that do one or more of the following:
  • Familiar testing and observation methods can also be used to identify correlations between keywords and sites turned up in a search using those keywords that end the search, thereby implying that they are productive. Larger samples may be better than smaller ones for some keywords, because searches end for a variety of reasons (e.g., searcher interrupted and had to do something else) in addition to ending when the searcher 112 finds what they're looking for. But with the invention, the last site/document visited, or one at which the searcher spends the most time, or one that the searcher 112 keeps returning to, can be identified 210 automatically by search assistants 120 . Then it is inferred that such a site/document would be helpful to another user 102 who is performing a search using the same or related keywords. The search 116 keywords and/or the resulting site/document are then supplied 212 to the searcher 102 .
  • Some of the criteria for identifying 210 related searches by other users include: testing for keywords that are lexically and/or semantically linked; testing whether a significant number of the same search results come up in two searches.
  • a “significant number” means at least a set percentage, e.g., at least 35% or 50% or 70%, while in some it means at least some threshold such as at least ten or a hundred or a thousand, or at least sonic number of the top ten percent such as at least five in the top twenty website results.
  • Keywords are “lexically linked” when they share a root, e.g., “search”, “searching”, “searcher”, and “searches” are four lexically linked words. Automatic tests for lexical linkage can be programmed, e.g., using look-up tales of suffixes and/or by parsing dictionary entries.
  • Keywords are “semantically linked” when their meaning overlaps or is closely related, e.g., “search”, “seek”, “chercher”, and “finding” are four semantically linked words. Automatic tests for semantic linkage can be implemented using semantic nets and other artificial intelligence techniques, or by using a thesaurus look-up, for example, including the possible use of an on-line thesaurus such as that found at website www dot lexfn dot com or at website www dot m-w dot com.
  • search engines per se; to searches 106 , 116 performed on the Internet or World Wide Web as opposed to other networks, machines, or data repositories 108 ; to monolingual searches 106 , 116 ; or to text-based searches as opposed to searches 106 , 116 of images or other non-textual data.
  • Subweb searching as illustrated in FIG. 2 may be implemented by making appropriate indexes of crawled pages, and then searching 106 only within those indexes. For instance, an index of pages searched in the last hour could be maintained, for each of the past 24 hours. Subweb searching may be implemented by modifying/supplementing existing indexes. For instance, a bit could be added to indicate whether a page contains embedded/linked-directly images. Subweb searching may be implemented as a filter on current search results, for instance, as discussed with use of the Way Back Machine. Other implementations that provide the functionality indicated here may also be apparent to those skilled in the art, particularly if they have experience implementing search engines as well as using them.
  • steps and other characteristics described herein may be combined in various ways to form embodiments of the invention.
  • steps may be omitted, repeated, renamed, supplemented, performed in serial or parallel, and/or grouped differently, except as required by the claims and to provide an operable embodiment.
  • not every step illustrated in a given example need be performed in a given method according to the invention.
  • System embodiments generally use a computer device 104 , 114 (computer, PDA, cell phone, or other device with interoperating processor and memory) which is networked to a search service 122 to perform at least one of the methods described herein.
  • a computer device 104 , 114 computer, PDA, cell phone, or other device with interoperating processor and memory
  • Different system embodiments may omit, repeat, regroup, supplement, or rearrange the system components discussed, provided the system overall is operable and conforms to at least one claim.
  • hardware, software, and firmware implementations are deemed partially or fully interchangeable at the time in question by one of skill in the art, they may be utilized in embodying the invention even though the specific examples discussed here are implemented differently.
  • references to an item generally means at least one such item is present and a reference to a step means at least one instance of the step is performed.

Abstract

The invention provides tools and techniques for assisting searches of large information collections, such as the Internet or databases. An initial search from a given user A is used to help identify related productive searches by other users. Those searches (keywords and/or searched portions) are then used to focus the search effort of user A. Activities by other users in response to search results are tracked to help a search assistant automatically identify the results those other users deemed of greatest interest.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of U.S. patent application Ser. No. 11/938,338, filed Nov. 12, 2007, entitled Search Tools and Techniques, which is a continuation of U.S. patent application Ser. No. 11/235,457, filed Sep. 26, 2005 (now abandoned), entitled Search Tools and Techniques, which claims the benefit of U.S. Provisional Patent Application No. 60/614,736, filed Sep. 30, 2004, entitled Search Tools and Techniques, the entire disclosures of which are incorporated herein by reference,
  • This application is related to U.S. patent application Ser. No. 12/030,179, filed Feb. 12, 2008, entitled Search Tools and Techniques, now U.S. Pat. No. 7,882,097.
  • FIELD OF THE INVENTION
  • The present invention relates generally to tools and techniques for searching large collections of information, such as databases and/or the Internet.
  • BACKGROUND
  • Well-known search tools include keyword-search interfaces such as those providing access to the United States Patent and Trademark database of patents and patent applications at the USPTO web site; those provided by search engine sites such as the Google site, the Yahoo! Site, and others; and variations on these such as those using a voice interface. Increasing information accessibility is a continuing and important effort to which the present invention may contribute.
  • SUMMARY OF INVENTION
  • The present invention provides search tools and techniques, which may be embodied in various forms such as methods, systems, products of processes, configured storage media, computer data structures, and the like. Unless otherwise stated, discussion of one form of embodiment illustrates but does not necessarily limit other forms of embodiment. For instance, discussion of methods of the invention illustrate systems of the invention, which may include computers configured to operate according to the methods, without necessarily requiring that the systems include every limitation discussed in connection with the methods. Likewise, the discussion of systems illustrates methods without necessarily limiting the methods, and so on, for each form of embodiment. For more information, please refer to the claims.
  • Portions of the documentation in this patent document contain material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office file or records, but otherwise reserves all copyright rights whatsoever.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • To illustrate the manner in which the advantages and features of the invention are obtained, a more particular description of the invention will be given with reference to the attached drawings. These drawings only illustrate selected aspects of the invention and thus do not fully determine the invention's scope. In the drawings:
  • In the Drawings:
  • FIG. 1 is a data flow diagram illustrating systems and methods according to the present invention; and
  • FIG. 2 is a flowchart illustrating methods (a.k.a. processes) of the invention.
  • DETAILED DESCRIPTION
  • Reference is made to exemplary embodiments, and specific language will be used herein to describe the same. But alterations and further modifications of the inventive features illustrated herein, and additional applications of the principles of the invention as illustrated herein, which would occur to one skilled in the relevant art(s) and having possession of this disclosure, should be considered within the scope of the invention.
  • In describing the invention, the meaning of important terms is clarified, so the claims must be read with careful attention to these clarifications. Specific examples are given to illustrate aspects of the invention, but those of skill in the relevant art(s) will understand that other examples may also fall within the meaning of the terms used, and within the scope of one or more claims. Words used herein do not necessarily have the same meaning they have in everyday usage, or in a particular dictionary or treatise. Terms may be defined, either explicitly or implicitly, here in the Description and/or elsewhere in the application file(s). It is not necessary for every means or aspect identified in a given definition or example to be present or to be utilized in every embodiment of the invention.
  • Introduction
  • Through an extension to a Google-like or other search interface, including one embedded in an application such as the Microsoft Word word processor (Google is a mark of Google, Inc.; Microsoft and Word are marks of Microsoft Corp.), search “subwebs” defined by particular criteria instead of searching all available items (“items” are data sources, e.g., web pages, documents, images, etc.). For instance, search subweb(s) may be defined by any one or more of the following criteria:
      • 1. items searched (by this user, or alternately by any user, or alternately by any user in a specified group) within a specified time period, e.g., within last hour, or anytime on Jul. 4, 2002;
      • 2. items whose web address matches a specified regular expression, e.g., wildcards as in “www.foobar.com/199?” or “*.edu”, or exclusion as in “not *novell*”;
      • 3. items in a chained-search, that is, items in a database having its own search interface that is visible through the searched subweb but whose contents are not necessarily visible in the searched subweb, e.g., ACM Digital library visible through website portal dot acm dot org slash dl.cfm, or USPTO database visible through website www dot uspto dot gov, by automatically keyword searching their underlying database (using their different search syntax as needed—translations between search syntaxes can be largely automated) and returning results;
      • 4. items from a snapshot of earlier versions, e.g., by coordination with the WayBack Machine at website www dot archive dot org slash web slash web.php, for instance, this could be done by automatically keyword searching current snapshot of web pages, automatically selecting URLs from those results, automatically feeding at least one such URL into the WayBack Machine, automatically keyword searching resulting archived pages from earlier web version, and then automatically returning those results to the user (useful, e.g., in case items currently do not exist on web, or are blocked by robots.txt, because maybe they're accessible if we go back to an earlier snapshot of the website);
      • 5. items in a broader context, by expanding search terms heuristically (much more than merely trying terms having same grammatical root), such as e.g., by adding thesaurus-looked-up synonyms or related terms, or such as e.g., by popping up a dictionary after determining by dictionary search that word is ambiguous (e.g., “bat—do you mean ‘flying mammal’ or ‘baseball equipment’) and letting user select correct definition and then supplementing/replacing, user-supplied initial ambiguous keyword with more focused search terms extracted from or based on dictionary entry selected by user, or such as e.g., by using artificial intelligence/natural language processing concepts of related words to enhance and focus search, e.g., enhance user-provided keyword “emotion” by adding “love, passion, hatred, excitement, anger, feeling, pride, heartfelt” as search terms;
      • 6. only items that have embedded/linked-directly images, e.g., don't give me text-only pages when I'm looking for a picture of Jim Bridger;
      • 7. items matching search terms that match regular expression keywords, e.g., find me pages that have “fingerprint” but not if it's followed by a PGP fingerprint, as in “‘fingerprint’ not immediately followed by {[0-9,A-Z][0-9,A-Z][blank]}+”;
      • 8. items that will not try to install pop-ups, adware, spyware, or other annoying crap (including viruses and Trojans, of course) if I go to that search result (could either exclude these annoying/dangerous result pages from the results shown to the user, or could include them but flag them and warn the user) (could implement this by going to the result page automatically and checking the result before handing that back among the results given to the user).
  • These search concepts can also be described using claim language. Because claims can be directed to various embodiments, multiple claims could be founded on a particular feature or set of features. To reduce repetition, however, some of the possible claims are summarized below in the form of a claim directed to “generally corresponding embodiments”. This claim is shorthand for a set of claims to generally corresponding embodiments. Accordingly, the claims are arranged below in claim sets. For instance, a claim set directed to “Generally corresponding embodiments including doing act one and doing act two” includes the following claims: “A method including the steps of doing act one and doing act two”, “A system including at least one device having a processor and a memory, the device(s) configured to perform act one and act two, solely and/or working together”, “A computer-readable storage medium configured with data and instructions to cause at least one device having a processor and a memory to perform act one and act two”, “A method comprising the steps of storing data defining at least a portion of a web site and providing interactive access over a network to the data, wherein the web site and the access permit performance of act one and act two”, and to the extent they make sense, “A data structure having a first field for holding data for performing act one, and having a second field for holding data for performing act two”, and “A product produced by a process including act one and act two”.
  • When possible, such claim summaries should also be read to cover different actors. For instance, a method performed by a searching service corresponds to a method performed by a user of that searching service (as when the service sends/receives information that the user receives/sends), so the generally corresponding embodiments include the methods performed by each actor—the method performed by the user and the corresponding method performed by the searching service.
  • One group of generally corresponding embodiments includes searching the Internet and/or another data repository according to at least one of the following criteria:
      • searching for items that have been searched within a user-specified time period;
      • searching for items that reside at an address whose Universal Resource Identifier is identified by string-matching to a regular expression having wildcards and/or exclusion;
      • searching for items in chained-search databases;
      • searching for items which are not necessarily present and accessible in the current snapshot by searching a snapshot of earlier versions for the items;
      • searching for items by expanding search terms heuristically;
      • searching for items that have embedded and/or linked-directly images, thereby ruling out result pages that do not contain at least an inline image or an image that is only one hyperlink away;
      • searching for items that match search terms defined by regular expressions;
      • searching for items that will not try to install pop-ups, adware, spyware, viruses, or Trojans.
    Search Subwebs Based on Other Users' Searches
  • Google and other search engines rely heavily on analysis of links, metatext, and/or other information in the documents that are being searched. But in trying to make searches more effective and more efficient, one can only extract so much information from the searched documents. At some point it becomes beneficial to identify and use information from people's use of the documents. By this I mean not merely their use as evident in placing links to/from the documents, but the other uses people make of search results, such as: identifying, reviewing, returning to, printing, and/or following links from, a given document.
  • In particular, and with regard to FIGS. 1 and 2, some embodiments of the present invention assist a user 102 by leveraging for that user the search expertise of other users 112. This can be done using heuristics or other methods to determine 210 which searches 116 and/or search results 118 of other users 112 are most likely to help the present user 102, and by focusing 212 the present user's search(es) 106 accordingly. As used here, “focusing” means narrowing, refocusing, or both. That is, “focusing” does not imply simply adding limitations; in keyword search terminology it may involve adding OR as well as adding AND, or changing keywords themselves, or both.
  • In operation, one or more users 112 use one or more computers 114 to formulate one or more searches 116 of at least one large collection 108 of information, “User 112” is used broadly here to mean any one or more persons and/or processes other than user 102. “Computer” (as in “computer 104”, “computer 114”, a.k.a. “computer device”) is used very broadly here to include any device for electronically accessing a database or other large collection, e.g., workstation, laptop, PDA, cell phone, other phone, kiosk, etc. The search 118 produces some result 118 (which may be “no result found matching your search criteria”), which are then normally at least partially displayed (on a screen, through audio production, and/or otherwise) to the user 112. This particular set of activities is familiar.
  • I believe, however, that automatically leveraging other users' search expertise as described here to focus a given user's search has not been done previously. A search assistance component 120 monitors 202, 206 and/or has access 204 to at least some of these activities and/or results of searchers 112. A search assistance tool 120 leverages those activities to assist user 102.
  • In some embodiments the search assistance component 120 only monitors 202 and/or accesses 204 searches 116 and their search results 118. In other embodiments, it also monitors/accesses 206 actions by users 112 performed on the computers 114, e.g., it tracks scrolling, following links, opening/closing browsers or other windows, and so on. Such actions by users 112 may imply stronger interest in one search result 118 than in another, which reflection of interest can in turn be used by the search assistant 120 to focus 212 the searches 106 (keywords and/or searched portions) of the user 102.
  • The search assistant 120 may be implemented in software, configuring and residing on internet/database servers 122 and/or distributed across computers 114, for instance. Special-purpose hardware may also be used in implementing the search assistance module 120. Any combination of hardware and software providing the claimed functionality may be used.
  • In one scenario, the user 102 formulates an initial keyword search 106 and sends it through computer 104 to the interact 108. This produces results 110, which are not the desired results, due to inaccuracy or incompleteness or both. The search assistant 120 notes 208 the initial search 106, and then finds 210 related productive searches 116 by other users 112. As used herein, “related” and “productive” each have a special meaning explained below. The results 118 of those related productive searches and/or the related productive searches 116 themselves, are then used by the search assistant to focus 212 (possibly iteratively) the search effort of user 102.
  • Some of the criteria for identifying 210 productive searches by others 112 include, for instance, identifying searches by other users that do one or more of the following:
      • (a) end with a click on a key word, e.g., on a Google Ad Words advertisement, especially if purchases follow (software 120 can track 206 clicks and purchase “conversions” and use it to identify productive search terms and/or good search results for a given search or group of related (e.g., shared keyword) searches);
      • (b) end with a document being displayed and read (tool 120 checks 206 the display time document is on a screen of computer 114 and/or checks 206 for scrolling by users 112 to get to the part of the search result document that is not initially displayed);
      • (c) return multiple times to a particular search result document 118 (that document may contain helpful leads, in the form of links and/or other content; in particular the tool 120 may watch 106 for instances of multiple browsers spawned on the computer 114 after the document is displayed there, in conjunction with redisplay of the document, as in a depth-first search by user 112 rooted at the document);
      • (d) lead to one search result site 118 where there is—relative to the other activity of the user 112 in question and/or other users 112 led to the site—a lot of activity, thereby indicating the site itself is being searched;
      • (e) lead to frequently-visited sites 118;
      • (f) lead to official sites 118 such as *.gov sites or *.org sites;
      • (g) lead to sites from which the user 102 copies data, e.g., by saving an image to disk or by “print screen”, or by cut-and-paste copying of text;
      • (h) lead to sites 118 having other indicia of relevance.
  • Familiar testing and observation methods can also be used to identify correlations between keywords and sites turned up in a search using those keywords that end the search, thereby implying that they are productive. Larger samples may be better than smaller ones for some keywords, because searches end for a variety of reasons (e.g., searcher interrupted and had to do something else) in addition to ending when the searcher 112 finds what they're looking for. But with the invention, the last site/document visited, or one at which the searcher spends the most time, or one that the searcher 112 keeps returning to, can be identified 210 automatically by search assistants 120. Then it is inferred that such a site/document would be helpful to another user 102 who is performing a search using the same or related keywords. The search 116 keywords and/or the resulting site/document are then supplied 212 to the searcher 102.
  • Some of the criteria for identifying 210 related searches by other users include: testing for keywords that are lexically and/or semantically linked; testing whether a significant number of the same search results come up in two searches. In some embodiments, a “significant number” means at least a set percentage, e.g., at least 35% or 50% or 70%, while in some it means at least some threshold such as at least ten or a hundred or a thousand, or at least sonic number of the top ten percent such as at least five in the top twenty website results.
  • Keywords are “lexically linked” when they share a root, e.g., “search”, “searching”, “searcher”, and “searches” are four lexically linked words. Automatic tests for lexical linkage can be programmed, e.g., using look-up tales of suffixes and/or by parsing dictionary entries.
  • Keywords are “semantically linked” when their meaning overlaps or is closely related, e.g., “search”, “seek”, “chercher”, and “finding” are four semantically linked words. Automatic tests for semantic linkage can be implemented using semantic nets and other artificial intelligence techniques, or by using a thesaurus look-up, for example, including the possible use of an on-line thesaurus such as that found at website www dot lexfn dot com or at website www dot m-w dot com.
  • Although specific examples are given here, they are illustrative only, and the invention is not limited to search engines per se; to searches 106, 116 performed on the Internet or World Wide Web as opposed to other networks, machines, or data repositories 108; to monolingual searches 106, 116; or to text-based searches as opposed to searches 106, 116 of images or other non-textual data.
  • Subweb searching as illustrated in FIG. 2 may be implemented by making appropriate indexes of crawled pages, and then searching 106 only within those indexes. For instance, an index of pages searched in the last hour could be maintained, for each of the past 24 hours. Subweb searching may be implemented by modifying/supplementing existing indexes. For instance, a bit could be added to indicate whether a page contains embedded/linked-directly images. Subweb searching may be implemented as a filter on current search results, for instance, as discussed with use of the Way Back Machine. Other implementations that provide the functionality indicated here may also be apparent to those skilled in the art, particularly if they have experience implementing search engines as well as using them.
  • CONCLUSION
  • The steps and other characteristics described herein may be combined in various ways to form embodiments of the invention. In methods of the invention, steps may be omitted, repeated, renamed, supplemented, performed in serial or parallel, and/or grouped differently, except as required by the claims and to provide an operable embodiment. In particular, not every step illustrated in a given example need be performed in a given method according to the invention.
  • System embodiments generally use a computer device 104, 114 (computer, PDA, cell phone, or other device with interoperating processor and memory) which is networked to a search service 122 to perform at least one of the methods described herein. Different system embodiments may omit, repeat, regroup, supplement, or rearrange the system components discussed, provided the system overall is operable and conforms to at least one claim. To the extent that hardware, software, and firmware implementations are deemed partially or fully interchangeable at the time in question by one of skill in the art, they may be utilized in embodying the invention even though the specific examples discussed here are implemented differently.
  • Although particular embodiments of the present invention are expressly described herein as methods or devices, it will be appreciated that discussion of one type of embodiment also generally extends to other embodiment types. For instance, the descriptions of search methods also help describe search systems. It does not follow that limitations from one embodiment are necessarily read into another.
  • Headings are for convenience only; information on a given topic may be found outside the section whose heading indicates that topic. All claims as filed are part of the specification and thus help describe the invention, and repeated claim language may be inserted outside the claims as needed.
  • It is to be understood that the above-referenced embodiments are illustrative of the application for the principles of the present invention. Numerous modifications and alternative embodiments can be devised without departing from the spirit and scope of the present invention. It will be apparent to those of ordinary skill in the art that numerous modifications can be made without departing from the principles and concepts of the invention as set forth in the claims.
  • As used herein, terms such as “a” and “the” and designations such as “search” and “searching” are inclusive of one or more of the indicated item or step. In particular, in the claims a reference to an item generally means at least one such item is present and a reference to a step means at least one instance of the step is performed.
  • The scope of the invention is indicated by the appended claims rather than being limited to the specific examples in the foregoing description and the Appendixes it incorporates. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope to the full extent permitted by law.

Claims (8)

1. A method of searching, comprising
(a) receiving a primary search input;
(b) identifying, based on the primary search input, at least one prior productive search;
(c) determining that the at least one prior productive search is related to the primary search input if the at least one prior productive search includes at least one of the following:
(i) a search request or search results comprising at least one keyword that is semantically linked to at least one term of the primary search input;
(ii) a search request or search results comprising at least one keyword that is lexically linked to at least one term of the primary search input; or
(iii) search results comprising at least a predetermined level of similarity to search results corresponding to the primary search input; and
(d) focusing a primary search corresponding to the primary search input based on (b) and (c).
2. The method of claim 1, wherein the identifying the at least one prior productive search comprises determining if a secondary search terminates with a click on the at least one keyword that is semantically linked to at least one term of the primary search input.
3. The method of claim 1, wherein the identifying the at least one prior productive search comprises determining if a secondary search terminates with a click on the at least one keyword that is lexically linked to at least one term of the primary search input.
4. The method of claim 1, wherein the identifying the at least one prior productive search comprises determining if a secondary search terminates with a purchase.
5. The method of claim 1, wherein the identifying the at least one prior productive search comprises determining if a secondary search terminates with a click on an advertisement.
6. The method of claim 1, wherein the identifying the at least one prior productive search comprises determining if a secondary search terminates with a document being displayed.
7. The method of claim 1, wherein the identifying the at least one prior productive search comprises determining if a secondary search results in a user associated with the secondary search returning to a particular search result more than once.
8. The method of claim 1, wherein the identifying the at least one prior productive search comprises determining if a secondary search results in an official site.
US13/362,591 2004-09-30 2012-01-31 Search Tools and Techniques Abandoned US20120131049A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/362,591 US20120131049A1 (en) 2004-09-30 2012-01-31 Search Tools and Techniques

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US61473604P 2004-09-30 2004-09-30
US11/235,457 US20060069675A1 (en) 2004-09-30 2005-09-26 Search tools and techniques
US93833807A 2007-11-12 2007-11-12
US13/362,591 US20120131049A1 (en) 2004-09-30 2012-01-31 Search Tools and Techniques

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US93833807A Continuation 2004-09-30 2007-11-12

Publications (1)

Publication Number Publication Date
US20120131049A1 true US20120131049A1 (en) 2012-05-24

Family

ID=36100435

Family Applications (3)

Application Number Title Priority Date Filing Date
US11/235,457 Abandoned US20060069675A1 (en) 2004-09-30 2005-09-26 Search tools and techniques
US12/030,179 Active 2026-08-11 US7882097B1 (en) 2004-09-30 2008-02-12 Search tools and techniques
US13/362,591 Abandoned US20120131049A1 (en) 2004-09-30 2012-01-31 Search Tools and Techniques

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US11/235,457 Abandoned US20060069675A1 (en) 2004-09-30 2005-09-26 Search tools and techniques
US12/030,179 Active 2026-08-11 US7882097B1 (en) 2004-09-30 2008-02-12 Search tools and techniques

Country Status (1)

Country Link
US (3) US20060069675A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8502888B2 (en) * 2006-12-28 2013-08-06 Canon Kabushiki Kaisha Image data management apparatus and method, image data search apparatus and method, and recording medium
US20210374189A1 (en) * 2018-10-04 2021-12-02 Showa Denko K.K. Document search device, document search program, and document search method
US20220067041A1 (en) * 2020-08-27 2022-03-03 Shopify Inc. Methods and systems for processing and storing streamed event data

Families Citing this family (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7434219B2 (en) 2000-01-31 2008-10-07 Commvault Systems, Inc. Storage of application specific profiles correlating to document versions
JP2005505039A (en) 2001-09-28 2005-02-17 コムヴォールト・システムズ・インコーポレーテッド Apparatus and method for archiving objects in an information storage device
KR100531150B1 (en) * 2005-03-10 2005-11-29 엔에이치엔(주) Method and system for captureing image of web site, managing information of web site, and providing image of web site
US7895223B2 (en) * 2005-11-29 2011-02-22 Cisco Technology, Inc. Generating search results based on determined relationships between data objects and user connections to identified destinations
BRPI0713830A2 (en) * 2006-07-24 2017-10-17 Chacha Search Inc "computer readable method for controlling a computer including a guide database, computer readable memory for controlling a computer including a video and system training database"
US7778994B2 (en) * 2006-11-13 2010-08-17 Google Inc. Computer-implemented interactive, virtual bookshelf system and method
US7734669B2 (en) * 2006-12-22 2010-06-08 Commvault Systems, Inc. Managing copies of data
US10229389B2 (en) * 2008-02-25 2019-03-12 International Business Machines Corporation System and method for managing community assets
US9237166B2 (en) * 2008-05-13 2016-01-12 Rpx Corporation Internet search engine preventing virus exchange
US8200649B2 (en) * 2008-05-13 2012-06-12 Enpulz, Llc Image search engine using context screening parameters
US20090313558A1 (en) * 2008-06-11 2009-12-17 Microsoft Corporation Semantic Image Collection Visualization
US8769048B2 (en) 2008-06-18 2014-07-01 Commvault Systems, Inc. Data protection scheduling, such as providing a flexible backup window in a data protection system
US9128883B2 (en) 2008-06-19 2015-09-08 Commvault Systems, Inc Data storage resource allocation by performing abbreviated resource checks based on relative chances of failure of the data storage resources to determine whether data storage requests would fail
US8352954B2 (en) 2008-06-19 2013-01-08 Commvault Systems, Inc. Data storage resource allocation by employing dynamic methods and blacklisting resource request pools
US8725688B2 (en) 2008-09-05 2014-05-13 Commvault Systems, Inc. Image level copy or restore, such as image level restore without knowledge of data object metadata
US20100070474A1 (en) 2008-09-12 2010-03-18 Lad Kamleshkumar K Transferring or migrating portions of data objects, such as block-level data migration or chunk-based data migration
US9009163B2 (en) * 2009-12-08 2015-04-14 Intellectual Ventures Fund 83 Llc Lazy evaluation of semantic indexing
US8202205B2 (en) * 2010-02-09 2012-06-19 GoBe Healthy, LLC Omni-directional exercise device
US8849762B2 (en) 2011-03-31 2014-09-30 Commvault Systems, Inc. Restoring computing environments, such as autorecovery of file systems at certain points in time
US10157184B2 (en) 2012-03-30 2018-12-18 Commvault Systems, Inc. Data previewing before recalling large data files
US9304584B2 (en) 2012-05-31 2016-04-05 Ca, Inc. System, apparatus, and method for identifying related content based on eye movements
US9633216B2 (en) 2012-12-27 2017-04-25 Commvault Systems, Inc. Application of information management policies based on operation with a geographic entity
US9459968B2 (en) 2013-03-11 2016-10-04 Commvault Systems, Inc. Single index to query multiple backup formats
US9483581B2 (en) * 2013-06-10 2016-11-01 Google Inc. Evaluation of substitution contexts
RU2014142268A (en) * 2013-10-29 2016-05-20 Андрей Юрьевич Щербаков INDIVIDUAL ASSISTANT WITH ARTIFICIAL INTELLIGENCE ELEMENTS AND METHOD OF ITS APPLICATION
US9798596B2 (en) 2014-02-27 2017-10-24 Commvault Systems, Inc. Automatic alert escalation for an information management system
US9648100B2 (en) 2014-03-05 2017-05-09 Commvault Systems, Inc. Cross-system storage management for transferring data across autonomous information management systems
US9823978B2 (en) 2014-04-16 2017-11-21 Commvault Systems, Inc. User-level quota management of data objects stored in information management systems
US9740574B2 (en) 2014-05-09 2017-08-22 Commvault Systems, Inc. Load balancing across multiple data paths
US11249858B2 (en) 2014-08-06 2022-02-15 Commvault Systems, Inc. Point-in-time backups of a production application made accessible over fibre channel and/or ISCSI as data sources to a remote application by representing the backups as pseudo-disks operating apart from the production application and its host
US9852026B2 (en) 2014-08-06 2017-12-26 Commvault Systems, Inc. Efficient application recovery in an information management system based on a pseudo-storage-device driver
US9444811B2 (en) 2014-10-21 2016-09-13 Commvault Systems, Inc. Using an enhanced data agent to restore backed up data across autonomous storage management systems
US9766825B2 (en) 2015-07-22 2017-09-19 Commvault Systems, Inc. Browse and restore for block-level backups
US10296647B2 (en) * 2015-10-05 2019-05-21 Oath Inc. Method and system for intent-driven searching
US10296368B2 (en) 2016-03-09 2019-05-21 Commvault Systems, Inc. Hypervisor-independent block-level live browse for access to backed up virtual machine (VM) data and hypervisor-free file-level recovery (block-level pseudo-mount)
US10838821B2 (en) 2017-02-08 2020-11-17 Commvault Systems, Inc. Migrating content and metadata from a backup system
US10740193B2 (en) 2017-02-27 2020-08-11 Commvault Systems, Inc. Hypervisor-independent reference copies of virtual machine payload data based on block-level pseudo-mount
US10891069B2 (en) 2017-03-27 2021-01-12 Commvault Systems, Inc. Creating local copies of data stored in online data repositories
US10776329B2 (en) 2017-03-28 2020-09-15 Commvault Systems, Inc. Migration of a database management system to cloud storage
US11074140B2 (en) 2017-03-29 2021-07-27 Commvault Systems, Inc. Live browsing of granular mailbox data
US10664352B2 (en) 2017-06-14 2020-05-26 Commvault Systems, Inc. Live browsing of backed up data residing on cloned disks
US10795927B2 (en) 2018-02-05 2020-10-06 Commvault Systems, Inc. On-demand metadata extraction of clinical image data
US10789387B2 (en) 2018-03-13 2020-09-29 Commvault Systems, Inc. Graphical representation of an information management system
US10860443B2 (en) 2018-12-10 2020-12-08 Commvault Systems, Inc. Evaluation and reporting of recovery readiness in a data storage management system
US11308034B2 (en) 2019-06-27 2022-04-19 Commvault Systems, Inc. Continuously run log backup with minimal configuration and resource usage from the source machine
US11126791B2 (en) * 2020-02-21 2021-09-21 Microsoft Technology Licensing, Llc In-application example library

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020065802A1 (en) * 2000-05-30 2002-05-30 Koki Uchiyama Distributed monitoring system providing knowledge services
US20040068514A1 (en) * 2002-10-04 2004-04-08 Parvathi Chundi System and method for biotechnology information access and data analysis
US20040078364A1 (en) * 2002-09-03 2004-04-22 Ripley John R. Remote scoring and aggregating similarity search engine for use with relational databases
US6816857B1 (en) * 1999-11-01 2004-11-09 Applied Semantics, Inc. Meaning-based advertising and document relevance determination
US20050125391A1 (en) * 2003-12-08 2005-06-09 Andy Curtis Methods and systems for providing a response to a query
US20050144067A1 (en) * 2003-12-19 2005-06-30 Palo Alto Research Center Incorporated Identifying and reporting unexpected behavior in targeted advertising environment
US20050203878A1 (en) * 2004-03-09 2005-09-15 Brill Eric D. User intent discovery
US20060064411A1 (en) * 2004-09-22 2006-03-23 William Gross Search engine using user intent
US7428528B1 (en) * 2004-03-31 2008-09-23 Endeca Technologies, Inc. Integrated application for manipulating content in a hierarchical data-driven search and navigation system

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030101092A1 (en) * 1998-05-27 2003-05-29 William Fuller Method for software distribution and compensation with replenishable advertisements
US6298446B1 (en) * 1998-06-14 2001-10-02 Alchemedia Ltd. Method and system for copyright protection of digital images transmitted over networks
US6460033B1 (en) * 1999-02-03 2002-10-01 Cary D. Perttunen Browsing methods, articles and apparatus
AU7534100A (en) * 1999-09-24 2001-04-24 Wordmap Limited Apparatus for and method of searching
US6732088B1 (en) * 1999-12-14 2004-05-04 Xerox Corporation Collaborative searching by query induction
US6721721B1 (en) * 2000-06-15 2004-04-13 International Business Machines Corporation Virus checking and reporting for computer database search results
US7191210B2 (en) * 2002-05-01 2007-03-13 James Grossman Computer implemented system and method for registering websites and for displaying registration indicia in a search results list
US20030212663A1 (en) * 2002-05-08 2003-11-13 Doug Leno Neural network feedback for enhancing text search
US7346839B2 (en) * 2003-09-30 2008-03-18 Google Inc. Information retrieval based on historical data
US7130819B2 (en) * 2003-09-30 2006-10-31 Yahoo! Inc. Method and computer readable medium for search scoring
US7281005B2 (en) * 2003-10-20 2007-10-09 Telenor Asa Backward and forward non-normalized link weight analysis method, system, and computer program product
US7444327B2 (en) * 2004-01-09 2008-10-28 Microsoft Corporation System and method for automated optimization of search result relevance
US20050268112A1 (en) * 2004-05-28 2005-12-01 Microsoft Corporation Managing spyware and unwanted software through auto-start extensibility points
US7624104B2 (en) 2006-06-22 2009-11-24 Yahoo! Inc. User-sensitive pagerank

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6816857B1 (en) * 1999-11-01 2004-11-09 Applied Semantics, Inc. Meaning-based advertising and document relevance determination
US20020065802A1 (en) * 2000-05-30 2002-05-30 Koki Uchiyama Distributed monitoring system providing knowledge services
US20040078364A1 (en) * 2002-09-03 2004-04-22 Ripley John R. Remote scoring and aggregating similarity search engine for use with relational databases
US20040068514A1 (en) * 2002-10-04 2004-04-08 Parvathi Chundi System and method for biotechnology information access and data analysis
US20050125391A1 (en) * 2003-12-08 2005-06-09 Andy Curtis Methods and systems for providing a response to a query
US20050144067A1 (en) * 2003-12-19 2005-06-30 Palo Alto Research Center Incorporated Identifying and reporting unexpected behavior in targeted advertising environment
US20050203878A1 (en) * 2004-03-09 2005-09-15 Brill Eric D. User intent discovery
US7428528B1 (en) * 2004-03-31 2008-09-23 Endeca Technologies, Inc. Integrated application for manipulating content in a hierarchical data-driven search and navigation system
US20060064411A1 (en) * 2004-09-22 2006-03-23 William Gross Search engine using user intent

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8502888B2 (en) * 2006-12-28 2013-08-06 Canon Kabushiki Kaisha Image data management apparatus and method, image data search apparatus and method, and recording medium
US20210374189A1 (en) * 2018-10-04 2021-12-02 Showa Denko K.K. Document search device, document search program, and document search method
US11755659B2 (en) * 2018-10-04 2023-09-12 Resonac Corporation Document search device, document search program, and document search method
US20220067041A1 (en) * 2020-08-27 2022-03-03 Shopify Inc. Methods and systems for processing and storing streamed event data
US11803563B2 (en) * 2020-08-27 2023-10-31 Shopify Inc. Methods and systems for processing and storing streamed event data

Also Published As

Publication number Publication date
US20060069675A1 (en) 2006-03-30
US7882097B1 (en) 2011-02-01

Similar Documents

Publication Publication Date Title
US7882097B1 (en) Search tools and techniques
Bhogal et al. A review of ontology based query expansion
CA2536265C (en) System and method for processing a query
US8639708B2 (en) Fact-based indexing for natural language search
US10552467B2 (en) System and method for language sensitive contextual searching
US20090070322A1 (en) Browsing knowledge on the basis of semantic relations
KR101524889B1 (en) Identification of semantic relationships within reported speech
Uren et al. The usability of semantic search tools: a review
US20090254540A1 (en) Method and apparatus for automated tag generation for digital content
US20090292685A1 (en) Video search re-ranking via multi-graph propagation
US20130268519A1 (en) Fact verification engine
BRPI0203479B1 (en) System for enriching document content
Zhang et al. Capturing the semantics of key phrases using multiple languages for question retrieval
Croft et al. Search engines
Gong et al. Web image indexing by using associated texts
Fauzi et al. Image understanding and the web: a state-of-the-art review
Kahloula et al. Plagiarism Detection in Arabic Documents: Approaches, Architecture and Systems.
Cameron et al. Semantics-empowered text exploration for knowledge discovery
WO2009035871A1 (en) Browsing knowledge on the basis of semantic relations
Klavans et al. Computational linguistics for metadata building (CLiMB): using text mining for the automatic identification, categorization, and disambiguation of subject terms for image metadata
Kanhabua Time-aware approaches to information retrieval
Rafiei et al. Data extraction from the web using wild card queries
Heenan A Review of Academic Research on Information Retrieval
Sanyal Semantically Enriched Line Search in a Humanities Digital Library
Chauhan et al. A framework to derive web page context from hyperlink structure

Legal Events

Date Code Title Description
AS Assignment

Owner name: RESOURCE CONSORTIUM LIMITED, VIRGIN ISLANDS, BRITI

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:OGILVIE, JOHN W.;REEL/FRAME:027953/0910

Effective date: 20110204

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: RESOURCE CONSORTIUM LIMITED, LLC, DELAWARE

Free format text: RE-DOMESTICATION AND ENTITY CONVERSION;ASSIGNOR:RESOURCE CONSORTIUM LIMITED;REEL/FRAME:050091/0297

Effective date: 20190621