US7480652B2 - Determining relevance of a document to a query based on spans of query terms - Google Patents
Determining relevance of a document to a query based on spans of query terms Download PDFInfo
- Publication number
- US7480652B2 US7480652B2 US11/259,621 US25962105A US7480652B2 US 7480652 B2 US7480652 B2 US 7480652B2 US 25962105 A US25962105 A US 25962105A US 7480652 B2 US7480652 B2 US 7480652B2
- Authority
- US
- United States
- Prior art keywords
- query
- relevance
- span
- term
- document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99932—Access augmentation or optimizing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99934—Query formulation, input preparation, or translation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99935—Query augmenting and refining, e.g. inexact access
Abstract
Description
where t is a query term of query Q, tf is term frequency of t within the document, k1, is a constant, and K and w(1) are defined by the following equations. K is represented by the following equation:
where l is the document length, avdl is the average document length in the corpus, and b is a constant, w(1) is a Robertson/Sparck Jones weight represented by the following equation:
where N is the number of documents within the corpus and n is the number of documents containing the query term t within the corpus.
where ti and tj represent a pair of adjacent query terms and tpi is represented by the following equation:
where d(ti, tj) is the distance between the query terms ti and tj. The relevance of a document based on query term pairs (i.e., bigrams) is then combined with the relevance based on single query terms (i.e., unigrams) to give the overall relevance of a document.
Where t is a query term, espani is a span that contains t, ni is the number of query terms that occur in espani, Width(espani) is the span width of espani, x is an exponent that is used to control the influence of the span width, and y is an exponent that is used to control the influence of the number of query terms in the span. When a span contains only one query term, its span width may be set to the threshold distance. The aggregation of span relevance into a query term relevance or relevance contribution is represented by the following equation:
Claims (13)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/259,621 US7480652B2 (en) | 2005-10-26 | 2005-10-26 | Determining relevance of a document to a query based on spans of query terms |
US12/351,765 US20090182734A1 (en) | 2005-10-26 | 2009-01-09 | Determining relevance of a document to a query based on spans of query terms |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/259,621 US7480652B2 (en) | 2005-10-26 | 2005-10-26 | Determining relevance of a document to a query based on spans of query terms |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/351,765 Continuation US20090182734A1 (en) | 2005-10-26 | 2009-01-09 | Determining relevance of a document to a query based on spans of query terms |
Publications (2)
Publication Number | Publication Date |
---|---|
US20070094234A1 US20070094234A1 (en) | 2007-04-26 |
US7480652B2 true US7480652B2 (en) | 2009-01-20 |
Family
ID=37986490
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/259,621 Expired - Fee Related US7480652B2 (en) | 2005-10-26 | 2005-10-26 | Determining relevance of a document to a query based on spans of query terms |
US12/351,765 Abandoned US20090182734A1 (en) | 2005-10-26 | 2009-01-09 | Determining relevance of a document to a query based on spans of query terms |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/351,765 Abandoned US20090182734A1 (en) | 2005-10-26 | 2009-01-09 | Determining relevance of a document to a query based on spans of query terms |
Country Status (1)
Country | Link |
---|---|
US (2) | US7480652B2 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070112734A1 (en) * | 2005-11-14 | 2007-05-17 | Microsoft Corporation | Determining relevance of documents to a query based on identifier distance |
US20070219989A1 (en) * | 2006-03-14 | 2007-09-20 | Yassine Faihe | Document retrieval |
US20090182734A1 (en) * | 2005-10-26 | 2009-07-16 | Ji-Rong Wen | Determining relevance of a document to a query based on spans of query terms |
US20100312793A1 (en) * | 2009-06-08 | 2010-12-09 | International Business Machines Corporation | Displaying relevancy of results from multi-dimensional searches using heatmaps |
US8909627B1 (en) | 2011-11-30 | 2014-12-09 | Google Inc. | Fake skip evaluation of synonym rules |
US8959103B1 (en) | 2012-05-25 | 2015-02-17 | Google Inc. | Click or skip evaluation of reordering rules |
US8965882B1 (en) | 2011-07-13 | 2015-02-24 | Google Inc. | Click or skip evaluation of synonym rules |
US8965875B1 (en) | 2012-01-03 | 2015-02-24 | Google Inc. | Removing substitution rules based on user interactions |
US9141672B1 (en) | 2012-01-25 | 2015-09-22 | Google Inc. | Click or skip evaluation of query term optionalization rule |
US9146966B1 (en) | 2012-10-04 | 2015-09-29 | Google Inc. | Click or skip evaluation of proximity rules |
US9152698B1 (en) | 2012-01-03 | 2015-10-06 | Google Inc. | Substitute term identification based on over-represented terms identification |
US9928265B2 (en) | 2006-01-31 | 2018-03-27 | Sap Se | Utilizing shared numeric locks |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7933890B2 (en) * | 2006-03-31 | 2011-04-26 | Google Inc. | Propagating useful information among related web pages, such as web pages of a website |
US20070244866A1 (en) * | 2006-04-18 | 2007-10-18 | Mainstream Advertising, Inc. | System and method for responding to a search request |
US9646078B2 (en) * | 2008-05-12 | 2017-05-09 | Groupon, Inc. | Sentiment extraction from consumer reviews for providing product recommendations |
TW201013433A (en) * | 2008-09-19 | 2010-04-01 | Esobi Inc | Filtering method for the same or similar documents |
US9449078B2 (en) | 2008-10-01 | 2016-09-20 | Microsoft Technology Licensing, Llc | Evaluating the ranking quality of a ranked list |
US8060456B2 (en) * | 2008-10-01 | 2011-11-15 | Microsoft Corporation | Training a search result ranker with automatically-generated samples |
US8214348B2 (en) * | 2010-02-25 | 2012-07-03 | Yahoo! Inc. | Systems and methods for finding keyword relationships using wisdoms from multiple sources |
US8402032B1 (en) * | 2010-03-25 | 2013-03-19 | Google Inc. | Generating context-based spell corrections of entity names |
US9430565B2 (en) * | 2014-01-22 | 2016-08-30 | Zefr, Inc. | Providing relevant content |
US9317566B1 (en) | 2014-06-27 | 2016-04-19 | Groupon, Inc. | Method and system for programmatic analysis of consumer reviews |
US11250450B1 (en) | 2014-06-27 | 2022-02-15 | Groupon, Inc. | Method and system for programmatic generation of survey queries |
US10878017B1 (en) | 2014-07-29 | 2020-12-29 | Groupon, Inc. | System and method for programmatic generation of attribute descriptors |
US10977667B1 (en) | 2014-10-22 | 2021-04-13 | Groupon, Inc. | Method and system for programmatic analysis of consumer sentiment with regard to attribute descriptors |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5826261A (en) * | 1996-05-10 | 1998-10-20 | Spencer; Graham | System and method for querying multiple, distributed databases by selective sharing of local relative significance information for terms related to the query |
US6581057B1 (en) * | 2000-05-09 | 2003-06-17 | Justsystem Corporation | Method and apparatus for rapidly producing document summaries and document browsing aids |
US6778981B2 (en) * | 2001-10-17 | 2004-08-17 | Korea Advanced Institute Of Science & Technology | Apparatus and method for similarity searches using hyper-rectangle based multidimensional data segmentation |
US6823333B2 (en) * | 2001-03-02 | 2004-11-23 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration | System, method and apparatus for conducting a keyterm search |
US20050022114A1 (en) * | 2001-08-13 | 2005-01-27 | Xerox Corporation | Meta-document management system with personality identifiers |
US6947930B2 (en) * | 2003-03-21 | 2005-09-20 | Overture Services, Inc. | Systems and methods for interactive search query refinement |
US20070005563A1 (en) * | 2005-06-30 | 2007-01-04 | Veveo, Inc. | Method and system for incremental search with reduced text entry where the relevance of results is a dynamically computed function of user input search string character count |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5594837A (en) * | 1993-01-29 | 1997-01-14 | Noyes; Dallas B. | Method for representation of knowledge in a computer as a network database system |
US5717924A (en) * | 1995-07-07 | 1998-02-10 | Wall Data Incorporated | Method and apparatus for modifying existing relational database schemas to reflect changes made in a corresponding object model |
US5734887A (en) * | 1995-09-29 | 1998-03-31 | International Business Machines Corporation | Method and apparatus for logical data access to a physical relational database |
US6088524A (en) * | 1995-12-27 | 2000-07-11 | Lucent Technologies, Inc. | Method and apparatus for optimizing database queries involving aggregation predicates |
US5870739A (en) * | 1996-09-20 | 1999-02-09 | Novell, Inc. | Hybrid query apparatus and method |
US6591272B1 (en) * | 1999-02-25 | 2003-07-08 | Tricoron Networks, Inc. | Method and apparatus to make and transmit objects from a database on a server computer to a client computer |
US6356900B1 (en) * | 1999-12-30 | 2002-03-12 | Decode Genetics Ehf | Online modifications of relations in multidimensional processing |
CA2354437A1 (en) * | 2001-07-31 | 2003-01-31 | Ibm Canada Limited-Ibm Canada Limitee | A schema for sharing relational database types |
US7480652B2 (en) * | 2005-10-26 | 2009-01-20 | Microsoft Corporation | Determining relevance of a document to a query based on spans of query terms |
-
2005
- 2005-10-26 US US11/259,621 patent/US7480652B2/en not_active Expired - Fee Related
-
2009
- 2009-01-09 US US12/351,765 patent/US20090182734A1/en not_active Abandoned
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5826261A (en) * | 1996-05-10 | 1998-10-20 | Spencer; Graham | System and method for querying multiple, distributed databases by selective sharing of local relative significance information for terms related to the query |
US6581057B1 (en) * | 2000-05-09 | 2003-06-17 | Justsystem Corporation | Method and apparatus for rapidly producing document summaries and document browsing aids |
US6823333B2 (en) * | 2001-03-02 | 2004-11-23 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration | System, method and apparatus for conducting a keyterm search |
US20050022114A1 (en) * | 2001-08-13 | 2005-01-27 | Xerox Corporation | Meta-document management system with personality identifiers |
US6778981B2 (en) * | 2001-10-17 | 2004-08-17 | Korea Advanced Institute Of Science & Technology | Apparatus and method for similarity searches using hyper-rectangle based multidimensional data segmentation |
US6947930B2 (en) * | 2003-03-21 | 2005-09-20 | Overture Services, Inc. | Systems and methods for interactive search query refinement |
US20070005563A1 (en) * | 2005-06-30 | 2007-01-04 | Veveo, Inc. | Method and system for incremental search with reduced text entry where the relevance of results is a dynamically computed function of user input search string character count |
Non-Patent Citations (16)
Title |
---|
Brin, Sergey and Lawrence Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine," Computer Networks and ISDN Systems, vol. 30, 1998, 25 pages. |
Clarke, Charles L. A. et al., "Shortest Substring Ranking (MultiText Experiments for TREC-4)," 1995, pp. 1-10. |
Clarke, Charles L.A. et al., "Relevance ranking for one to three term queries," Information Processing and Management, 36 (2000), (C) 2000 Elsevier Science Ltd., pp. 291-311. |
Fox, Christopher, "A Stop List for General Text," SIGIR Forum, ACM Press, vol. 24, No. 4, Dec. 1990, pp. 19-35. |
Hawking, David and Paul Thistlewaite, "Proximity Operators-So Near And Yet So Far," In Proceedings of TREC-4, 1995, pp. 131-143. |
Hawking, David and Paul Thistlewaite, "Relevance Weighting Using Distance Between Term Occurrences," Computer Science Technical Report TR-CS-96-08, The Australian National University, Aug. 1996, 20 pages. |
Luhn, H.P., "The Automatic Creation of Literature Abstracts," IBM Journal, Apr. 1958, 2, pp. 159-165. |
Papineni, Kishore, "Why Inverse Document Frequency?," In Proceedings of the NAACL 2001, pp. 25-32. |
Papka, Ron and James Allan, "Document Classification using Multiword Features," Conference on Information and Knowledge Management, Nov. 1998, ACM Press, pp. 124-131. |
Porter, M.F.,"An Algorithm for suffix stripping," Program, vol. 14, No. 3, Jul. 1980, pp. 130-137. |
Rasolofo, Yves and Jacques Savoy, "Term Proximity Scoring for Keyword-Based Retrieval Systems," Advances in Information Retrieval: 25th European Conference on IR Research, ECIR, 2003, Italy, (C) Springer-Verlag Berlin Heidelberg 2003, pp. 207-218. |
Robertson, S.E. and K. Spark Jones, "Relevance Weighting of Search Terms," Journal of the American Society for Information Science, vol. 27, No. 3, May-Jun. 1976, pp. 129-146. |
Robertson, S.E. et al., "Experimentation as a way of life: Okapi at TREC," Information Processing and Management, vol. 36, 2000, (C) 1999 Elsevier Science Ltd., pp. 95-108. |
Rose, Daniel E. and Curt Stevens, "V-Twin: A Lightweight Engine for Interactive Use," In Proceedings of TREC-5, 1996, pp. 279-290. |
Spink, Amanda et al., "Searching the Web: The Public and Their Queries," Journal of the American Society for Information Science and Technology, 52(3), Feb. 1, 2001, (C) 2001 John Wiley & Sons, Inc., pp. 226-234. |
Wilkinson, Ross, Justin Zobel and Ron Sacks-Davis, "Similarity Measures for Short Queries," Oct. 1995, In Proceedings of TREC-4, 1995, pp. 277-285. |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090182734A1 (en) * | 2005-10-26 | 2009-07-16 | Ji-Rong Wen | Determining relevance of a document to a query based on spans of query terms |
US7630964B2 (en) * | 2005-11-14 | 2009-12-08 | Microsoft Corporation | Determining relevance of documents to a query based on identifier distance |
US20070112734A1 (en) * | 2005-11-14 | 2007-05-17 | Microsoft Corporation | Determining relevance of documents to a query based on identifier distance |
US9928265B2 (en) | 2006-01-31 | 2018-03-27 | Sap Se | Utilizing shared numeric locks |
US20070219989A1 (en) * | 2006-03-14 | 2007-09-20 | Yassine Faihe | Document retrieval |
US7827161B2 (en) * | 2006-03-14 | 2010-11-02 | Hewlett-Packard Development Company, L.P. | Document retrieval |
US20100312793A1 (en) * | 2009-06-08 | 2010-12-09 | International Business Machines Corporation | Displaying relevancy of results from multi-dimensional searches using heatmaps |
US8965882B1 (en) | 2011-07-13 | 2015-02-24 | Google Inc. | Click or skip evaluation of synonym rules |
US8909627B1 (en) | 2011-11-30 | 2014-12-09 | Google Inc. | Fake skip evaluation of synonym rules |
US8965875B1 (en) | 2012-01-03 | 2015-02-24 | Google Inc. | Removing substitution rules based on user interactions |
US9152698B1 (en) | 2012-01-03 | 2015-10-06 | Google Inc. | Substitute term identification based on over-represented terms identification |
US9141672B1 (en) | 2012-01-25 | 2015-09-22 | Google Inc. | Click or skip evaluation of query term optionalization rule |
US8959103B1 (en) | 2012-05-25 | 2015-02-17 | Google Inc. | Click or skip evaluation of reordering rules |
US9146966B1 (en) | 2012-10-04 | 2015-09-29 | Google Inc. | Click or skip evaluation of proximity rules |
Also Published As
Publication number | Publication date |
---|---|
US20090182734A1 (en) | 2009-07-16 |
US20070094234A1 (en) | 2007-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7480652B2 (en) | Determining relevance of a document to a query based on spans of query terms | |
US7664735B2 (en) | Method and system for ranking documents of a search result to improve diversity and information richness | |
US9697249B1 (en) | Estimating confidence for query revision models | |
US20070005588A1 (en) | Determining relevance using queries as surrogate content | |
US7565345B2 (en) | Integration of multiple query revision models | |
US7849089B2 (en) | Method and system for adapting search results to personal information needs | |
US8112269B2 (en) | Determining utility of a question | |
US7676520B2 (en) | Calculating importance of documents factoring historical importance | |
US7870147B2 (en) | Query revision using known highly-ranked queries | |
US7720870B2 (en) | Method and system for quantifying the quality of search results based on cohesion | |
US8612453B2 (en) | Topic distillation via subsite retrieval | |
US20060230005A1 (en) | Empirical validation of suggested alternative queries | |
US7376643B2 (en) | Method and system for determining similarity of objects based on heterogeneous relationships | |
US8484193B2 (en) | Look-ahead document ranking system | |
US7890502B2 (en) | Hierarchy-based propagation of contribution of documents | |
Zaragoza et al. | Web Search Relevance Ranking. | |
Yoshida et al. | What's going on in search engine rankings? |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICROSOFT CORPORATION, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WEN, JI-RONG;SONG, RUIHUA;MA, WEI-YING;REEL/FRAME:017333/0739 Effective date: 20051201 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034543/0001 Effective date: 20141014 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20210120 |