US20150006520A1 - Person Search Utilizing Entity Expansion - Google Patents
Person Search Utilizing Entity Expansion Download PDFInfo
- Publication number
- US20150006520A1 US20150006520A1 US13/931,922 US201313931922A US2015006520A1 US 20150006520 A1 US20150006520 A1 US 20150006520A1 US 201313931922 A US201313931922 A US 201313931922A US 2015006520 A1 US2015006520 A1 US 2015006520A1
- Authority
- US
- United States
- Prior art keywords
- search
- query
- related entity
- search query
- computer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G06F17/30477—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Definitions
- Locating content regarding a specific person on the Internet can be challenging. There are many factors that make “people search” difficult: most names are not unique. In any given area there may be several individuals with the same name. Additionally, the web presence of any given person may be low such that search results for that person will be dominated by results referring to a better known individual with the same name.
- a search query is received from a computer user, the search query identifying a person for which content (or references to content) is sought.
- related entity data is obtained from at least one related entity source for the identified person.
- Related entity data comprises at least one of a related entity (or entities) or a category associated with the identified person.
- An expanded search query is generated according to the search query from the computer user and the related entity data. Search results are obtained according to the expanded search query and a search results presentation is generated and returned to the computer user in response to the search query.
- a computer-readable medium bearing computer-executable instructions When executed on a computing system comprising at least a processor executing the instructions retrieved from the medium, the computing system is configured to carry out a method for responding to a search query from a user. More particularly, in response to receiving a search query from a computer user, where the search query identifies a person for which content (or references to content) is sought, related entity data is obtained from at least one related entity source for the identified person. An expanded search query is generated according to the search query from the computer user and the related entity data. Search results are obtained according to the expanded search query and a search results presentation is generated and returned to the computer user in response to the search query.
- a computer system for responding to a search query for content related to a person comprises a processor and a memory, wherein the processor executes instructions stored in the memory as part of or in conjunction with additional components to respond to a search query for content related to a person.
- additional components include (by way of illustration and not limitation) a query topic identification component, a related entity retrieval component, an expanded query generator, a search results retrieval component, and a search results presentation generator.
- the query topic identification component configured to determine the identity of a person from the search query for which related content is sought.
- the related entity retrieval component obtains related entity data corresponding to the identified person from a related entity source.
- the expanded query generator After obtaining related entity data, the expanded query generator generates an expanded query from the search query for content related to the identified person and from the related entity data.
- the related entity data comprises at least one of a related entity or a category associated with the identified person of the search query.
- the search results retrieval component obtains search results from a content store according to the expanded search query.
- the search results presentation generator generates a search results presentation according to the search results referencing content corresponding to the identified person and returns the search results presentation to the computer user.
- FIG. 1 is a block diagram of a networked environment suitable for implementing aspects of the disclosed subject matter
- FIG. 2 is a flow diagram illustrating an exemplary routine for providing improved results in response to a search query regarding content for a particular person through query expansion;
- FIG. 3 is a flow diagram illustrating an exemplary routine for generating an expanded search query according to aspects of the disclosed subject matter
- FIGS. 4 and 5 illustrate elements of expanded search queries
- FIG. 6 is a block diagram illustrating exemplary components of a search engine configured to provide improved results in response to a search query from a computer user.
- exemplary in this document should be interpreted as serving as an illustration or example of something, and it should not be interpreted as an ideal and/or a leading illustration of that thing.
- An entity corresponds to an abstract or tangible thing that includes, by way of illustration and not limitation: person, a place, a group, a concept, an activity, and the like.
- FIG. 1 is a block diagram illustrating an exemplary networked environment 100 suitable for implementing aspects of the disclosed subject matter, particularly in regard to providing improved search results to a computer user in response to a search query regarding a person.
- the exemplary networked environment 100 includes one or more user computers, such as user computers 102 - 106 , connected to a network 108 , such as the Internet, a wide area network or WAN, and the like.
- User computers include, by way of illustration and not limitation: desktop computers (such as desktop computer 104 ); laptop computers (such as laptop computer 102 ); tablet computers (such as tablet computer 106 ); mobile devices (not shown); game consoles (not shown); personal digital assistants (not shown); and the like.
- User computers may be configured to connect to the network 108 by way of wired and/or wireless connections.
- the exemplary networked environment 100 illustrates the network 108 as being located between the user computers 102 - 106 and the search engine 110 , and again between the search engine 110 and the network sites 112 - 116 . This illustration, however, should not be construed as suggesting that these are separate networks.
- network sites 110 - 116 Also connected to the network 108 are various networked sites, including network sites 110 - 116 .
- the networked sites connected to the network 108 include a search engine 110 configured to respond to search queries from computer users, news sources 112 and 114 which host various news articles and content, a social networking site 116 , and the like.
- a computer user such as computer user 101 , may navigate via a user computer, such as user computer 102 , to these and other networked sites to access content, including news content.
- the search engine 110 is configured to provide search results (typically in the form of references to content available on the network 108 ) in response to a search query from a computer user.
- search engine 110 identifies content related to the identified person according to information in its content store, generates a search results presentation based on at least some of the identified content, and provides the search results presentation to the computer user.
- FIG. 1 also illustratively includes a social network site 116 and various news sources, including news sites 112 - 114 .
- a social network site 116 is an online site/service that provides a platform in which a computer user can establish a profile describing various aspects of the user, build relationships and social networks with other computer users, groups, and the like.
- a computer user can establish or indicate various interests, activities, and backgrounds with those in his/her social network.
- a computer user is often able to indicate a preference or an interest in a particular entity on a social networking service as might be hosted by social networking site 116 , whether that entity is a person, a place, a group, a concept, an activity, and the like.
- social networking site 116 is included in the illustrative network environment 100 , this is merely illustrative and should not be viewed as limiting upon the disclosed subject matter. In an actual embodiment, there may be any number of social network sites connected to the network 108 .
- the search engine 110 is configured to communicate (directly or indirectly through services calls and/or web crawlers) with multiple content sources, including news sites 112 and 114 , social networking site 116 , and other sites such as blogs and registries (not shown) to obtain information regarding the content that is available at each network site. Information regarding available content may also be pushed to the search engine from various services and/or networking sites. This information is stored (typically as references to the content) in a content store such that the search engine can obtain content from this content store in order to respond to a search query from a computer user, such as computer user 101 .
- the search engine 110 may also obtain information regarding any given individual from search query logs, network browsing histories, purchase histories, and the like.
- a search engine 110 may also be configured to obtain information from other network sites when responding to a search query. For example, according to aspects of the disclosed subject matter, when responding to a search query, the search engine 110 may obtain data from one or more social networking sites, such as social network site 116 , as relevant information to return to the requesting computer user and/or as information to assist the search engine in identifying relevant information to return to the requesting computer user.
- social networking sites such as social network site 116
- FIG. 2 is a flow diagram of an exemplary routine for providing improved results in response to a search query regarding content corresponding to a particular person through query expansion.
- the search engine 110 receives a search query from a computer user, such as computer user 101 , the search query requesting content corresponding to a particular person.
- a search query is typically (though not exclusively) a text string.
- a search query for content relating to a person may be “Bruce Wayne.”
- the search engine attempts to uniquely identify the person who is the subject matter of the search query.
- the search engine attempts to uniquely identify the person for which content is requested according to at least general information and specific information relating to the requesting computer user.
- the general information includes, by way of illustration and not limitation: popularity of search queries corresponding to a person with the name identified in the search query; trending popularity of a person with the name identified in the search query; other terms and/or phrases in the search query (e.g., “Bruce Wayne Seattle” or “Bruce Wayne Microsoft”); an image representative of the person; and the like.
- Specific information relating to the requesting computer user may include, by way of illustration and not limitation: current location; prior search query history; current and former workplaces; current and former educational institutions that were attended; social networks; preferences (both explicitly and implicitly identified); general graph connectivity between the requesting computer user and potential subjects of a search query as well as the number of mutual friends; physical distance between the requesting user and the potential subjects; location of friends; former locations; and the like.
- the search engine 110 may, at least internally, associate a globally unique identifier to the person who is the subject matter of the search query.
- the search engine 110 may use the associated globally unique identifier in obtaining, or reranking, search results in response to the search query.
- the order presented in blocks 202 and 204 should be viewed as illustrative and not limiting upon the disclosed subject matter.
- the identity of a person for whom content is sought may be known prior to submitting/receiving a search request.
- auto-suggest search recommendations may indicate a particular person as one of the auto-suggestions and, typically, that suggested person's unique identity is known.
- another service may submit a search request for a person that uniquely identities the person to the search service such that the identity of the person needs not be determined.
- this is illustrative of one embodiment, and is not limiting upon the disclosed subject matter.
- the search request identifying a person for whom content is sought, there may also be times in which the name of that person is not known but some information is provided that may lead to uniquely identifying that person.
- the computer user may not know the name of the general manager of the Seattle Seahawks, but in submitting the text “general manager of the Seattle Seahawks” the computer user often sufficiently identifies the person for whom content is sought that, in block 204 , the identity of the person can be determined.
- related entity data includes entities related to the identified person.
- a related entity is an entity with which the identified person is related for some reason. While some of the reasons may be known, others may be unknown and implied according to statistical similarities. For example, assume that the identified person is an employee of Company A and is a member of Workgroup Z. Related entities to the identified person, based on this employment relationship, would typically include “Company A” and “Workgroup Z.” Other related entities arising from this same employment relationship may include fellow co-workers.
- Still other entities may also include other (previous) workgroups, past and present co-workers, and the like.
- the identified person may also be an alumnus of particular university.
- the university may be a related entity to the identified person, as well as the particular college in the university where the identified person studied, the degree that was awarded, academic achievements of the identified person, fellow students, and the like.
- the identified person may be a member of a local master gardeners society and, as a result, the local master gardeners society may be a related entity to the identified person as well as fellow members of the society.
- the search engine 110 obtains related entity data from one or more related entity sources.
- the search engine 110 may store host or store various information regarding the identified person from a user profile store (e.g., the user profile store 628 of FIG. 6 ) and, therefore, be one of the related entity sources.
- the search engine 110 may store user profile information corresponding to the computer user. This user profile information may be based on explicitly identified information (from the identified person) as well as implicitly identified information (such as information derived from search queries, browsing history, and the like.)
- Social networking sites such as social networking site 116 , represent additional related entity sources.
- a social networking site enables a person, such as the identified person of the search query, to establish relationships and social networks with other entities (that includes people, organizations, activities, causes, and the like.)
- entities that includes people, organizations, activities, causes, and the like.
- the search engine 110 can be configured to obtained related entity data from any number of these related entity sources.
- the related entity information that is hosted by each of the related entity sources may comprise information that the identified person wishes to keep private.
- the search engine identifies the requesting computer user and, if identified, can use attempt to use the permissions afforded to the requesting computer user in obtaining the related entity information.
- a computer user is required to authenticate himself or herself in order to access information regarding the identified person. Other requirements may include, by way of illustration and not limitation, that the requesting computer user be logged into one or more services in order to access and/or view content that would otherwise be restricted.
- a related entity source may associate one or more categories to an individual (such as the identified person of a search query).
- the related entity data obtained from the related entity sources may also include category data.
- Category data (both in regard to the set of potential relationships defined by the category as well as the actual relationships of a person per a category) may be advantageously used in expanding a received search query (as discussed in greater detail below.)
- a related entity source may have associated various categories with the identified person including “Employee,” “Alumnus,” and “Gardener.”
- each of the related entity sources may maintain category information that defines what is meant to be associated with the category.
- This category information often includes a list of potential, though not necessarily required, relationships that may exists between a first entity belonging to a specific category (such as the identified person) and other entities.
- the “Employee” category may define a set of potential relationships as including “employer,” “work group,” “current manager,” “direct reports,” “co-worker,” and the like.
- each entity that is categorized as an “Employee” could then have relationships with other entities as defined by the set of potential relationships.
- an entity of that category is not required to be related to other entities based on each and every potential relationship.
- a given entity such as an entity corresponding to the identified person of a search query, may be associated with a plurality of categories.
- categories may also be inferred. For example, an employee may be interested in former work performed previously at a company such that an inferred category is “co-worker.”
- a search model is identified/determined to apply to the expanded search query.
- This search model includes information for weighting various elements (terms and phrases) of the expanded search query to improve search results.
- Applying a search model to the expanded search query recognizes, at least in part, that not all query terms of the expanded search query are equal, i.e., some query terms are more important in identifying relevant search content for the identified person than others.
- favoring/weighting employment-related query terms or education-related query terms provides improved search results when the relevancy of the various search results (or, more accurately stated, the content referenced by the search results) are presented to a particular user.
- selection of a search model may be based on information regarding the requesting computer user.
- selection of a search model may be made according to information regarding the identified person, from information available to the search engine 110 or external sources including from the related entity data.
- selection of a search model may be made according to information regarding both the requesting computer user as well as the identified person of the search query.
- FIG. 3 is a flow diagram illustrating an exemplary routine 300 for generating an expanded search query according to related entity data obtained from related entity sources.
- the identified person and filter elements of the received search query are included as an initial section of the expanded search query. While this may entail simply copying the received search query into the initial section, the initial search query may not necessarily simply be copied. Often a requesting computer user may misspell the name of the person that is sought or any one of the identifying filter elements associated with the person.
- a received search query may be “Bruse Wayn Microsoft,” in an effort to find content corresponding to “Bruce Wayne” who works at “Microsoft.” If it can be determined that the name (or one or more filter elements) is misspelled, it would be less productive to include the original search query in the expanded search query. Hence, in block 204 of routine 200 , the person is identified. Correction to the filter elements may also be made (though not explicitly called out in routines 200 and 300 .)
- query terms are derived from the obtained related entity data and included/incorporated in the expanded search query.
- the related entities (related to the identified person) from the obtained related entity data are included in a related entities section of the expanded search query in accordance with the determined search model.
- query terms are derived from the category data including both the category (as an entity) and category entities (as described below) are included in a category entities section of the expanded search query according to the search model.
- the expanded search query is returned and the routine 300 terminates.
- FIG. 4 illustrates an exemplary expanded search query 400 corresponding to the example above, i.e., for the person “Bruce Wayne.” For this example, it is assumed that this identified person, “Bruce Wayne,” was associated with only one category, Employee.
- the initial section 402 includes the original search query text 404 , “Bruce.Wayne,” as well as alternative names related to the identified person, in this case “Batman Dark.Knight Matches.Malone Caped.Crusader.” Of course, not all computer users will have access rights to all information.
- syntactical conventions include (by way of illustration and not limitation): the operator 408 “inbody:” indicating to the search engine 110 that it should match a document when any one of the words/terms between the parentheses is found in the body of the content; a “noalter:” operator that indicates that the spelling of the terms should not be modified; and a “norelax:” operator that indicates that the terms are important and may not be dropped in matching content.
- the operator 410 “+” indicates to a search engine a concatenation of other search operators and/or tokens.
- the expanded search query 400 also includes a related entity section 412 that includes the related entities to the identified person of the search query, such as text 416 “Research.” Still further included in the expanded search query is a category entities section 414 that includes the category entities of category “Employee.” As mentioned above, the category entities section 414 includes the category (“Employee”) as well as the category entities such as text 418 “Workgroup.” These entries optionally help produce results based on how the computer user likely knows the identified person, in this case “Bruce Wayne.” As can be seen, the expanded search query for a particular person takes a search query, such as “Bruce Wayne” and expands the query with related entities as well as category entities to better identify content corresponding to the identified person.
- this operator operates to let the ranking of a document go up as a matching token/value is found in the document, such as “Research.” It operates such that the specified terms are not required to be found in a resulting document but, if found, will result in the document being ranked as more relevant.
- the operator, “word:”, operates to match on a document if one or more of the tokens in the parenthesis, such as “Workgroup”, is found in the document. In a sense, the operator “word:” operates as a type of max (or maximum value) operator, comparing each token between the parenthesis to the document and returning the single maximum value of the rank of the tokens. Specifically, if more than one token match, only the value of the greatest match token is returned.
- a “norank:” token (not shown) would require that the specified tokens (identified between the enclosing parentheses) be required in a results document but doesn't affect the ordering or relevance of the document in the overall results.
- expanded queries 400 and 500 generally include textual tokens (such as “Bruce.Wayne”), it should be appreciated that this is illustrative and should not be viewed as limiting upon the disclosed subject matter.
- one or more the tokens in an expanded search query could be specific identifiers that identify the sought-for person and/or related entities.
- expanded search query 500 includes an operator 510 that includes a Facebook numerical identifier (“740049358”) as well as an operator 512 that includes a Facebook user identifier (“t-drake”).
- any particular sources of identifiers may be used and Facebook identifiers are illustrative only.
- FIG. 5 illustrates an exemplary expanded search query 500 corresponding to the example above, i.e., for the identified person “Bruce Wayne,” but in this example includes information from two categories, Employer and Education.
- the expanded search query 500 includes the initial section 502 as well as related entities section 504 and category entities section 506 .
- the expanded search queries become more detailed and encompassing to assist the search engine to identify content corresponding to the identified person of the search query.
- search results are obtained according to the expanded search query.
- Obtaining search results according to a search query in this case a search query with expanded terms according to related entities and categories is known in the art.
- search results are obtained according to the query terms from the received search query and optionally according to the query terms derived from the related entity data.
- the query terms of the expanded search query that are derived from the related entity data are intended to expand the scope of content/search results that correspond to the identified person, but these query terms that are derived from the related entity data are not mandatory terms.
- the expanded search query expands the scope of content that potentially relates to the identified person rather than narrowing the scope of content if those query terms were not optional.
- a search results presentation is generated, at least in part, according to the obtained search results.
- one or more search results pages are generated according to the obtained search results, with those results scoring the highest being presented in the first pages of the presentation.
- at block 216 after generating the search results presentation, at least a portion of the presentation is returned to the requesting computer user in response to the search query. According to various embodiments, the results that are returned to the requesting computer user are organized according to the various categories of information regarding the subject person. Thereafter, the routine 200 terminates.
- routine 200 While not displayed in routine 200 , additional steps may be taken after the results are returned to the computer user.
- one or more processes on the computer user's device may monitor the computer user's activity with regard to the results provided, e.g., which references (hyperlinks) the computer user followed, which were avoided, how long the computer user spent with some content vs. other content, and the like.
- inferences may be made regarding specific people and/or entities such that subsequent queries may take these inferences into account. Indeed, some or all of the inferences, both for and against specific results, may be used to form the search models discussed above.
- routines 200 and 300 while these routines are expressed in regard to discrete steps, these steps should be viewed as being logical in nature and may or may not correspond to any actual and/or discrete steps of a particular implementation. Nor should the order in which these steps are presented in the various routines be construed as the only order in which the steps may be carried out. Moreover, while these routines include various novel features of the disclosed subject matter, other steps (not listed) may also be carried out in the execution of the routines. Further, those skilled in the art will appreciate that logical steps of these routines may be combined together or be comprised of multiple steps. Steps of routines 200 and 300 may be carried out in parallel or in series, or pre-computed.
- routines Often, but not exclusively, the functionality of the various routines is embodied in software (e.g., applications, system services, libraries, and the like) that is executed on computer hardware and/or systems as described below in regard to FIG. 6 . In various embodiments, all or some of the various routines may also be embodied in hardware modules, including system on chips, on a computer system.
- software e.g., applications, system services, libraries, and the like
- all or some of the various routines may also be embodied in hardware modules, including system on chips, on a computer system.
- routines embodied in applications (also referred to as computer programs), apps (small, generally single or narrow purposed, applications), and/or methods
- these aspects may also be embodied as computer-executable instructions stored by computer-readable media, also referred to as computer-readable storage media.
- computer-readable media can host computer-executable instructions for later retrieval and execution.
- the computer-executable instructions stored on the computer-readable storage devices are executed, they carry out various steps, methods and/or functionality, including those steps, methods, and routines described above in regard to routines 200 and 300 .
- Examples of computer-readable media include, but are not limited to: optical storage media such as Blu-ray discs, digital video discs (DVDs), compact discs (CDs), optical disc cartridges, and the like; magnetic storage media including hard disk drives, floppy disks, magnetic tape, and the like; memory storage devices such as random access memory (RAM), read-only memory (ROM), memory cards, thumb drives, and the like; cloud storage (i.e., an online storage service); and the like.
- optical storage media such as Blu-ray discs, digital video discs (DVDs), compact discs (CDs), optical disc cartridges, and the like
- magnetic storage media including hard disk drives, floppy disks, magnetic tape, and the like
- memory storage devices such as random access memory (RAM), read-only memory (ROM), memory cards, thumb drives, and the like
- cloud storage i.e., an online storage service
- FIG. 6 is a block diagram illustrating exemplary components of a search engine configured to provide improved results in response to a search query from a computer user.
- the search engine 110 includes a processor 602 (or processing unit) and a memory 604 interconnected by way of a system bus 610 .
- memory 604 typically (but not always) comprises both volatile memory 606 and non-volatile memory 608 .
- Volatile memory 606 retains or stores information so long as the memory is supplied with power.
- non-volatile memory 608 is capable of storing (or persisting) information even when a power supply is not available.
- RAM and CPU cache memory are examples of volatile memory whereas ROM and memory cards are examples of non-volatile memory.
- the processor 602 executes instructions retrieved from the memory 604 in carrying out various functions, particularly in responding to search queries with improved results through query expansion.
- the processor 602 may be comprised of any of various commercially available processors such as single-processor, multi-processor, single-core units, and multi-core units.
- processors such as single-processor, multi-processor, single-core units, and multi-core units.
- mainframe computers personal computers (e.g., desktop computers, laptop computers, tablet computers, etc.); handheld computing devices such as smartphones, personal digital assistants, and the like; microprocessor-based or programmable consumer electronics; game consoles, and the like.
- the system bus 610 provides an interface for the various components to inter-communicate.
- the system bus 610 can be of any of several types of bus structures that can interconnect the various components (including both internal and external components).
- the search engine 110 further includes a network communication component 612 for interconnecting the network site with other computers (including, but not limited to, user computers such as user computers 102 - 106 , other network sites including network sites 112 - 116 ) as well as other devices on a computer network 108 .
- the network communication component 612 may be configured to communicate with other devices and services on an external network, such as network 108 , via a wired connection, a wireless connection, or both.
- the search engine 110 also includes query topic identification component 614 that is configured to obtain identify the subject matter of the search query, such as a person identified in the search query, as described above. Also included in the search engine 110 is a related entity retrieval component 616 .
- the related entity retrieval component 616 obtains related entity data corresponding to related entities of the identified person (or, more generally, related entities of the subject matter of the search query). As previously mentioned, the related entity data includes related entities, categories associated with the identified person, as well as category data corresponding to the associated categories.
- the related entity retrieval component 616 obtains the related entity data from related entity sources as described above in regard to FIG. 2 .
- An expanded query generator 618 generates an expanded search query from the search query received from a computer user according to the related entity data obtained by the related entity retrieval component 616 .
- a search results retrieval component is configured to obtain search results from a content store 626 according to the expanded search query generated by the expanded query component 618 .
- a search model component 624 is configured to select a search model (as described above) and apply the search model to the obtained search results.
- the search results presentation generator 620 generates a search results presentation, typically including one or more search results pages, for presentation to the requesting computer user in response to the search query.
- the various components of the search engine 110 of FIG. 6 described above may be implemented as executable software modules within the computer systems, as hardware modules (including SoCs—system on a chip), or a combination of the two. Moreover, each of the various components may be implemented as an independent, cooperative process or device, operating in conjunction with one or more computer systems. It should be further appreciated, of course, that the various components described above in regard to the search engine 110 should be viewed as logical components for carrying out the various described functions. As those skilled in the art appreciate, logical components (or subsystems) may or may not correspond directly, in a one-to-one manner, to actual, discrete components. In an actual embodiment, the various components of each computer system may be combined together or broke up across multiple actual components and/or implemented as cooperative processes on a computer network 108 .
- aspects of the disclosed subject matter may be implemented on other computing devices and/or distributed on multiple computing devices, including a computer user's device.
- at least some highly relevant content to a search request may be hosted on a site that is access-protected, i.e., the content is available to the computer user when he/she is authenticated and/or maintains an open log-in status with the site, but the content is otherwise restricted to others.
- a search engine may indirectly obtain related entity data from this access-restricted site by way of the computer user's device; the computer user's device (e.g., upon which the computer user maintains a current logged in status with the site) accesses related entity data on behalf of the search service.
- the computer user's device e.g., upon which the computer user maintains a current logged in status with the site accesses related entity data on behalf of the search service.
- one or more components on the computer user's device obtain data corresponding to others from the access restricted sites in anticipation of a search request.
- aspects of the disclosed subject matter may be suitably and advantageously applied to auto-generation of content relating to people.
- various search queries regarding one or more persons may be made such that the “latest” content on the Internet regarding that person (or persons) may already be available when requested.
- Yet another example would be to set up an environment such that a user may be notified when a new image/video/news story of that user occurs on the Internet.
- aspects of the disclosed subject matter may be applied to topics or entities other than people.
- an auto-generation page may be set up to display the latest regarding rock climbing, the Supreme Court, and the like.
Abstract
Description
- The present application is related to U.S. patent application Ser. No. ______, filed on ______, entitled “Entity Expansion to Identify Related Entities” [attorney docket no. 338971.01]; and U.S. patent application Ser. No. 13/913,835, filed on Jun. 10, 2013, entitled “Improved News Results through Query Expansion”.
- Locating content regarding a specific person on the Internet can be challenging. There are many factors that make “people search” difficult: most names are not unique. In any given area there may be several individuals with the same name. Additionally, the web presence of any given person may be low such that search results for that person will be dominated by results referring to a better known individual with the same name.
- The following Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. The Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
- According to aspects of the disclosed subject matter, a search query is received from a computer user, the search query identifying a person for which content (or references to content) is sought. Upon receiving the search query from a computer user, related entity data is obtained from at least one related entity source for the identified person. Related entity data comprises at least one of a related entity (or entities) or a category associated with the identified person. An expanded search query is generated according to the search query from the computer user and the related entity data. Search results are obtained according to the expanded search query and a search results presentation is generated and returned to the computer user in response to the search query.
- According to further aspects of the disclosed subject matter, a computer-readable medium bearing computer-executable instructions is presented. When executed on a computing system comprising at least a processor executing the instructions retrieved from the medium, the computing system is configured to carry out a method for responding to a search query from a user. More particularly, in response to receiving a search query from a computer user, where the search query identifies a person for which content (or references to content) is sought, related entity data is obtained from at least one related entity source for the identified person. An expanded search query is generated according to the search query from the computer user and the related entity data. Search results are obtained according to the expanded search query and a search results presentation is generated and returned to the computer user in response to the search query.
- According still further aspects of the disclosed subject matter, a computer system for responding to a search query for content related to a person is presented. The computer system comprises a processor and a memory, wherein the processor executes instructions stored in the memory as part of or in conjunction with additional components to respond to a search query for content related to a person. These additional components include (by way of illustration and not limitation) a query topic identification component, a related entity retrieval component, an expanded query generator, a search results retrieval component, and a search results presentation generator. In operation, the query topic identification component configured to determine the identity of a person from the search query for which related content is sought. The related entity retrieval component obtains related entity data corresponding to the identified person from a related entity source. After obtaining related entity data, the expanded query generator generates an expanded query from the search query for content related to the identified person and from the related entity data. According to various embodiments, the related entity data comprises at least one of a related entity or a category associated with the identified person of the search query. The search results retrieval component obtains search results from a content store according to the expanded search query. Thereafter, the search results presentation generator generates a search results presentation according to the search results referencing content corresponding to the identified person and returns the search results presentation to the computer user.
- The foregoing aspects and many of the attendant advantages of the disclosed subject matter will become more readily appreciated as they are better understood by reference to the following description when taken in conjunction with the following drawings, wherein:
-
FIG. 1 is a block diagram of a networked environment suitable for implementing aspects of the disclosed subject matter; -
FIG. 2 is a flow diagram illustrating an exemplary routine for providing improved results in response to a search query regarding content for a particular person through query expansion; -
FIG. 3 is a flow diagram illustrating an exemplary routine for generating an expanded search query according to aspects of the disclosed subject matter; -
FIGS. 4 and 5 illustrate elements of expanded search queries; and -
FIG. 6 is a block diagram illustrating exemplary components of a search engine configured to provide improved results in response to a search query from a computer user. - For purposed of clarity, the use of the term “exemplary” in this document should be interpreted as serving as an illustration or example of something, and it should not be interpreted as an ideal and/or a leading illustration of that thing. An entity corresponds to an abstract or tangible thing that includes, by way of illustration and not limitation: person, a place, a group, a concept, an activity, and the like.
- Turning to
FIG. 1 ,FIG. 1 is a block diagram illustrating an exemplary networkedenvironment 100 suitable for implementing aspects of the disclosed subject matter, particularly in regard to providing improved search results to a computer user in response to a search query regarding a person. The exemplarynetworked environment 100 includes one or more user computers, such as user computers 102-106, connected to anetwork 108, such as the Internet, a wide area network or WAN, and the like. User computers include, by way of illustration and not limitation: desktop computers (such as desktop computer 104); laptop computers (such as laptop computer 102); tablet computers (such as tablet computer 106); mobile devices (not shown); game consoles (not shown); personal digital assistants (not shown); and the like. User computers may be configured to connect to thenetwork 108 by way of wired and/or wireless connections. For purposes of illustration only, the exemplary networkedenvironment 100 illustrates thenetwork 108 as being located between the user computers 102-106 and thesearch engine 110, and again between thesearch engine 110 and the network sites 112-116. This illustration, however, should not be construed as suggesting that these are separate networks. - Also connected to the
network 108 are various networked sites, including network sites 110-116. By way of example and not limitation, the networked sites connected to thenetwork 108 include asearch engine 110 configured to respond to search queries from computer users,news sources social networking site 116, and the like. A computer user, such ascomputer user 101, may navigate via a user computer, such asuser computer 102, to these and other networked sites to access content, including news content. - According to aspects of the disclosed subject matter, the
search engine 110 is configured to provide search results (typically in the form of references to content available on the network 108) in response to a search query from a computer user. In particular, in response to receiving a search query from a computer user for information regarding a particular person, thesearch engine 110 identifies content related to the identified person according to information in its content store, generates a search results presentation based on at least some of the identified content, and provides the search results presentation to the computer user. -
FIG. 1 also illustratively includes asocial network site 116 and various news sources, including news sites 112-114. As will be readily appreciated, asocial network site 116 is an online site/service that provides a platform in which a computer user can establish a profile describing various aspects of the user, build relationships and social networks with other computer users, groups, and the like. In asocial network site 116, a computer user can establish or indicate various interests, activities, and backgrounds with those in his/her social network. Indeed, those skilled in the art will appreciate that a computer user is often able to indicate a preference or an interest in a particular entity on a social networking service as might be hosted bysocial networking site 116, whether that entity is a person, a place, a group, a concept, an activity, and the like. Though only onesocial network site 116 is included in theillustrative network environment 100, this is merely illustrative and should not be viewed as limiting upon the disclosed subject matter. In an actual embodiment, there may be any number of social network sites connected to thenetwork 108. - As is known in the art, the
search engine 110 is configured to communicate (directly or indirectly through services calls and/or web crawlers) with multiple content sources, includingnews sites social networking site 116, and other sites such as blogs and registries (not shown) to obtain information regarding the content that is available at each network site. Information regarding available content may also be pushed to the search engine from various services and/or networking sites. This information is stored (typically as references to the content) in a content store such that the search engine can obtain content from this content store in order to respond to a search query from a computer user, such ascomputer user 101. Thesearch engine 110 may also obtain information regarding any given individual from search query logs, network browsing histories, purchase histories, and the like. This information and the content obtained from the various network sites is typically indexed according to key words and phrases such that the information may be quickly identified and accessed. Further, in addition to information that is stored in the search engine's content store, asearch engine 110 may also be configured to obtain information from other network sites when responding to a search query. For example, according to aspects of the disclosed subject matter, when responding to a search query, thesearch engine 110 may obtain data from one or more social networking sites, such associal network site 116, as relevant information to return to the requesting computer user and/or as information to assist the search engine in identifying relevant information to return to the requesting computer user. - To further illustrate aspects of the disclosed subject matter, reference is now made to
FIG. 2 .FIG. 2 is a flow diagram of an exemplary routine for providing improved results in response to a search query regarding content corresponding to a particular person through query expansion. Beginning atblock 202, thesearch engine 110 receives a search query from a computer user, such ascomputer user 101, the search query requesting content corresponding to a particular person. - As will be readily appreciated, a search query is typically (though not exclusively) a text string. For example, a search query for content relating to a person may be “Bruce Wayne.” Accordingly, as there may be several individuals who have the same name, at
block 204, the search engine attempts to uniquely identify the person who is the subject matter of the search query. According to aspects of the disclosed subject matter, the search engine attempts to uniquely identify the person for which content is requested according to at least general information and specific information relating to the requesting computer user. The general information includes, by way of illustration and not limitation: popularity of search queries corresponding to a person with the name identified in the search query; trending popularity of a person with the name identified in the search query; other terms and/or phrases in the search query (e.g., “Bruce Wayne Seattle” or “Bruce Wayne Microsoft”); an image representative of the person; and the like. Specific information relating to the requesting computer user may include, by way of illustration and not limitation: current location; prior search query history; current and former workplaces; current and former educational institutions that were attended; social networks; preferences (both explicitly and implicitly identified); general graph connectivity between the requesting computer user and potential subjects of a search query as well as the number of mutual friends; physical distance between the requesting user and the potential subjects; location of friends; former locations; and the like. Typically, though not exclusively, thesearch engine 110 may, at least internally, associate a globally unique identifier to the person who is the subject matter of the search query. Moreover, once the person who is the subject matter of the search query is identifier, thesearch engine 110 may use the associated globally unique identifier in obtaining, or reranking, search results in response to the search query. - Of course, the order presented in
blocks blocks FIG. 2 , this is illustrative of one embodiment, and is not limiting upon the disclosed subject matter. - In regard to the search request identifying a person for whom content is sought, there may also be times in which the name of that person is not known but some information is provided that may lead to uniquely identifying that person. For example, the computer user may not know the name of the general manager of the Seattle Seahawks, but in submitting the text “general manager of the Seattle Seahawks” the computer user often sufficiently identifies the person for whom content is sought that, in
block 204, the identity of the person can be determined. - At
block 206, after having identified the person who is the subject matter of the search query, thesearch engine 110 obtains related entity data corresponding to the identified person. According to aspects of the disclosed subject matter, related entity data includes entities related to the identified person. A related entity is an entity with which the identified person is related for some reason. While some of the reasons may be known, others may be unknown and implied according to statistical similarities. For example, assume that the identified person is an employee of Company A and is a member of Workgroup Z. Related entities to the identified person, based on this employment relationship, would typically include “Company A” and “Workgroup Z.” Other related entities arising from this same employment relationship may include fellow co-workers. Still other entities, based on this same employment relationship, may also include other (previous) workgroups, past and present co-workers, and the like. In furtherance of the example above, the identified person may also be an alumnus of particular university. Hence, the university may be a related entity to the identified person, as well as the particular college in the university where the identified person studied, the degree that was awarded, academic achievements of the identified person, fellow students, and the like. Still further, assuming that the identified person also has a passion for gardening, the identified person may be a member of a local master gardeners society and, as a result, the local master gardeners society may be a related entity to the identified person as well as fellow members of the society. - According to aspects of the disclosed subject matter, the
search engine 110 obtains related entity data from one or more related entity sources. Thesearch engine 110 may store host or store various information regarding the identified person from a user profile store (e.g., theuser profile store 628 ofFIG. 6 ) and, therefore, be one of the related entity sources. For example, thesearch engine 110 may store user profile information corresponding to the computer user. This user profile information may be based on explicitly identified information (from the identified person) as well as implicitly identified information (such as information derived from search queries, browsing history, and the like.) Social networking sites, such associal networking site 116, represent additional related entity sources. As indicated above, a social networking site enables a person, such as the identified person of the search query, to establish relationships and social networks with other entities (that includes people, organizations, activities, causes, and the like.) Of course, there may be a variety of related entity sources, each of which hosting information that may indicate a relationship between the identified person and other entities, and thesearch engine 110 can be configured to obtained related entity data from any number of these related entity sources. - It should be appreciated that the related entity information that is hosted by each of the related entity sources may comprise information that the identified person wishes to keep private. To resolve this, according to aspects of the disclosed subject the search engine identifies the requesting computer user and, if identified, can use attempt to use the permissions afforded to the requesting computer user in obtaining the related entity information. In various embodiments, a computer user is required to authenticate himself or herself in order to access information regarding the identified person. Other requirements may include, by way of illustration and not limitation, that the requesting computer user be logged into one or more services in order to access and/or view content that would otherwise be restricted.
- As suggested in regard to the examples above, a related entity source may associate one or more categories to an individual (such as the identified person of a search query). Accordingly, the related entity data obtained from the related entity sources may also include category data. Category data (both in regard to the set of potential relationships defined by the category as well as the actual relationships of a person per a category) may be advantageously used in expanding a received search query (as discussed in greater detail below.) In the example above, a related entity source may have associated various categories with the identified person including “Employee,” “Alumnus,” and “Gardener.” Moreover, each of the related entity sources may maintain category information that defines what is meant to be associated with the category. This category information often includes a list of potential, though not necessarily required, relationships that may exists between a first entity belonging to a specific category (such as the identified person) and other entities. The “Employee” category may define a set of potential relationships as including “employer,” “work group,” “current manager,” “direct reports,” “co-worker,” and the like. Correspondingly, each entity that is categorized as an “Employee” could then have relationships with other entities as defined by the set of potential relationships. Of course, while a category that defines a set of potential relationships, an entity of that category is not required to be related to other entities based on each and every potential relationship. Further still, a given entity, such as an entity corresponding to the identified person of a search query, may be associated with a plurality of categories. In addition to defined categories, categories may also be inferred. For example, an employee may be interested in former work performed previously at a company such that an inferred category is “co-worker.”
- At
block 208, a search model is identified/determined to apply to the expanded search query. This search model includes information for weighting various elements (terms and phrases) of the expanded search query to improve search results. Applying a search model to the expanded search query recognizes, at least in part, that not all query terms of the expanded search query are equal, i.e., some query terms are more important in identifying relevant search content for the identified person than others. Typically, though not exclusively, favoring/weighting employment-related query terms or education-related query terms provides improved search results when the relevancy of the various search results (or, more accurately stated, the content referenced by the search results) are presented to a particular user. According to various embodiments, selection of a search model may be based on information regarding the requesting computer user. For example, if it is known that the requesting computer user is in college then an education model may be selected. Alternatively, selection of a search model may be made according to information regarding the identified person, from information available to thesearch engine 110 or external sources including from the related entity data. In yet additional embodiments, selection of a search model may be made according to information regarding both the requesting computer user as well as the identified person of the search query. - At
block 210, an expanded search query is generated according to the determined search model for the identified person. Generating an expanded search query is discussed in greater detail in regard toFIG. 3 . More particularly,FIG. 3 is a flow diagram illustrating anexemplary routine 300 for generating an expanded search query according to related entity data obtained from related entity sources. Atblock 302, the identified person and filter elements of the received search query are included as an initial section of the expanded search query. While this may entail simply copying the received search query into the initial section, the initial search query may not necessarily simply be copied. Often a requesting computer user may misspell the name of the person that is sought or any one of the identifying filter elements associated with the person. For example, a received search query may be “Bruse Wayn Microsoft,” in an effort to find content corresponding to “Bruce Wayne” who works at “Microsoft.” If it can be determined that the name (or one or more filter elements) is misspelled, it would be less productive to include the original search query in the expanded search query. Hence, inblock 204 of routine 200, the person is identified. Correction to the filter elements may also be made (though not explicitly called out inroutines - In addition to including the query terms of the search query into the expanded search query, query terms are derived from the obtained related entity data and included/incorporated in the expanded search query. In particular, at
block 304, the related entities (related to the identified person) from the obtained related entity data are included in a related entities section of the expanded search query in accordance with the determined search model. Atblock 306, query terms are derived from the category data including both the category (as an entity) and category entities (as described below) are included in a category entities section of the expanded search query according to the search model. Thereafter, atblock 308, the expanded search query is returned and the routine 300 terminates. - To better illustrate the above-described sections of the expanded search query, reference is made to
FIG. 4 .FIG. 4 illustrates an exemplary expandedsearch query 400 corresponding to the example above, i.e., for the person “Bruce Wayne.” For this example, it is assumed that this identified person, “Bruce Wayne,” was associated with only one category, Employee. As shown in the expandedsearch query 400, theinitial section 402 includes the originalsearch query text 404, “Bruce.Wayne,” as well as alternative names related to the identified person, in this case “Batman Dark.Knight Matches.Malone Caped.Crusader.” Of course, not all computer users will have access rights to all information. In the example able, not all people might know of the alternative names that might uniquely reference “Bruce Wayne.” However, when the requesting computer user has full rights, such information may be useful to obtain improved results. Regarding theoperator 406 “.” between the two names of the search query, this is representative of an exemplary convention to indicate that the two names, “Bruce” and “Wayne”, should be viewed as preferring “Bruce” occurring next to “Wayne” in that order, though it is not mandatory that the occur together or that both must occur—only that it is highly preferred. Of course, this convention (as well as the other operators in this Figure) is illustrative only and should not be viewed as limiting upon the disclosed subject matter. Other syntactical conventions include (by way of illustration and not limitation): theoperator 408 “inbody:” indicating to thesearch engine 110 that it should match a document when any one of the words/terms between the parentheses is found in the body of the content; a “noalter:” operator that indicates that the spelling of the terms should not be modified; and a “norelax:” operator that indicates that the terms are important and may not be dropped in matching content. Theoperator 410 “+”indicates to a search engine a concatenation of other search operators and/or tokens. - The expanded
search query 400 also includes arelated entity section 412 that includes the related entities to the identified person of the search query, such astext 416 “Research.” Still further included in the expanded search query is a category entities section 414 that includes the category entities of category “Employee.” As mentioned above, the category entities section 414 includes the category (“Employee”) as well as the category entities such astext 418 “Workgroup.” These entries optionally help produce results based on how the computer user likely knows the identified person, in this case “Bruce Wayne.” As can be seen, the expanded search query for a particular person takes a search query, such as “Bruce Wayne” and expands the query with related entities as well as category entities to better identify content corresponding to the identified person. Regarding the operator “rankonly:”, this operator operates to let the ranking of a document go up as a matching token/value is found in the document, such as “Research.” It operates such that the specified terms are not required to be found in a resulting document but, if found, will result in the document being ranked as more relevant. The operator, “word:”, operates to match on a document if one or more of the tokens in the parenthesis, such as “Workgroup”, is found in the document. In a sense, the operator “word:” operates as a type of max (or maximum value) operator, comparing each token between the parenthesis to the document and returning the single maximum value of the rank of the tokens. Specifically, if more than one token match, only the value of the greatest match token is returned. A “norank:” token (not shown) would require that the specified tokens (identified between the enclosing parentheses) be required in a results document but doesn't affect the ordering or relevance of the document in the overall results. In combination with the operator “rankonly:”, the rank of a document in which the rank of the document is increased if any one or more of the tokens is found. - While the expanded
queries search query 500 includes anoperator 510 that includes a Facebook numerical identifier (“740049358”) as well as anoperator 512 that includes a Facebook user identifier (“t-drake”). Of course, any particular sources of identifiers may be used and Facebook identifiers are illustrative only. - As suggested above, an identified person may be associated with more than one category. Hence, while the expanded
search query 400 ofFIG. 4 describes information from a single category, it is for illustration. Similarly,FIG. 5 illustrates an exemplary expandedsearch query 500 corresponding to the example above, i.e., for the identified person “Bruce Wayne,” but in this example includes information from two categories, Employer and Education. As can be seen, the expandedsearch query 500 includes theinitial section 502 as well as related entities section 504 andcategory entities section 506. As can be seen in the related entities section 504 andcategory entities section 506, as more related entities are found for the identified person and as more information corresponding to various categories for the identified person are obtained, the expanded search queries become more detailed and encompassing to assist the search engine to identify content corresponding to the identified person of the search query. - At
block 212 search results are obtained according to the expanded search query. Obtaining search results according to a search query, in this case a search query with expanded terms according to related entities and categories is known in the art. According to aspects of the disclosed subject matter, search results are obtained according to the query terms from the received search query and optionally according to the query terms derived from the related entity data. Stated differently, the query terms of the expanded search query that are derived from the related entity data are intended to expand the scope of content/search results that correspond to the identified person, but these query terms that are derived from the related entity data are not mandatory terms. In this manner (i.e., that the query terms derived from the related entity data are “optional”), the expanded search query expands the scope of content that potentially relates to the identified person rather than narrowing the scope of content if those query terms were not optional. - At
block 214, a search results presentation is generated, at least in part, according to the obtained search results. Typically, one or more search results pages are generated according to the obtained search results, with those results scoring the highest being presented in the first pages of the presentation. Atblock 216, after generating the search results presentation, at least a portion of the presentation is returned to the requesting computer user in response to the search query. According to various embodiments, the results that are returned to the requesting computer user are organized according to the various categories of information regarding the subject person. Thereafter, the routine 200 terminates. - While not displayed in routine 200, additional steps may be taken after the results are returned to the computer user. By way of illustration and not limitation, one or more processes on the computer user's device may monitor the computer user's activity with regard to the results provided, e.g., which references (hyperlinks) the computer user followed, which were avoided, how long the computer user spent with some content vs. other content, and the like. By monitoring the computer user's activity and submitting it to the search engine, inferences may be made regarding specific people and/or entities such that subsequent queries may take these inferences into account. Indeed, some or all of the inferences, both for and against specific results, may be used to form the search models discussed above.
- Regarding
routines routines FIG. 6 . In various embodiments, all or some of the various routines may also be embodied in hardware modules, including system on chips, on a computer system. - While many novel aspects of the disclosed subject matter are expressed in routines embodied in applications (also referred to as computer programs), apps (small, generally single or narrow purposed, applications), and/or methods, these aspects may also be embodied as computer-executable instructions stored by computer-readable media, also referred to as computer-readable storage media. As those skilled in the art will recognize, computer-readable media can host computer-executable instructions for later retrieval and execution. When the computer-executable instructions stored on the computer-readable storage devices are executed, they carry out various steps, methods and/or functionality, including those steps, methods, and routines described above in regard to
routines - Turning now to
FIG. 6 ,FIG. 6 is a block diagram illustrating exemplary components of a search engine configured to provide improved results in response to a search query from a computer user. As shown inFIG. 6 , thesearch engine 110 includes a processor 602 (or processing unit) and amemory 604 interconnected by way of asystem bus 610. As those skilled in the art will appreciated,memory 604 typically (but not always) comprises bothvolatile memory 606 andnon-volatile memory 608.Volatile memory 606 retains or stores information so long as the memory is supplied with power. In contrast,non-volatile memory 608 is capable of storing (or persisting) information even when a power supply is not available. Generally speaking, RAM and CPU cache memory are examples of volatile memory whereas ROM and memory cards are examples of non-volatile memory. - The
processor 602 executes instructions retrieved from thememory 604 in carrying out various functions, particularly in responding to search queries with improved results through query expansion. Theprocessor 602 may be comprised of any of various commercially available processors such as single-processor, multi-processor, single-core units, and multi-core units. Moreover, those skilled in the art will appreciate that the novel aspects of the disclosed subject matter may be practiced with other computer system configurations, including but not limited to: mini-computers; mainframe computers, personal computers (e.g., desktop computers, laptop computers, tablet computers, etc.); handheld computing devices such as smartphones, personal digital assistants, and the like; microprocessor-based or programmable consumer electronics; game consoles, and the like. - The
system bus 610 provides an interface for the various components to inter-communicate. Thesystem bus 610 can be of any of several types of bus structures that can interconnect the various components (including both internal and external components). Thesearch engine 110 further includes anetwork communication component 612 for interconnecting the network site with other computers (including, but not limited to, user computers such as user computers 102-106, other network sites including network sites 112-116) as well as other devices on acomputer network 108. Thenetwork communication component 612 may be configured to communicate with other devices and services on an external network, such asnetwork 108, via a wired connection, a wireless connection, or both. - The
search engine 110 also includes querytopic identification component 614 that is configured to obtain identify the subject matter of the search query, such as a person identified in the search query, as described above. Also included in thesearch engine 110 is a relatedentity retrieval component 616. The relatedentity retrieval component 616 obtains related entity data corresponding to related entities of the identified person (or, more generally, related entities of the subject matter of the search query). As previously mentioned, the related entity data includes related entities, categories associated with the identified person, as well as category data corresponding to the associated categories. The relatedentity retrieval component 616 obtains the related entity data from related entity sources as described above in regard toFIG. 2 . An expandedquery generator 618 generates an expanded search query from the search query received from a computer user according to the related entity data obtained by the relatedentity retrieval component 616. - A search results retrieval component is configured to obtain search results from a
content store 626 according to the expanded search query generated by the expandedquery component 618. Asearch model component 624 is configured to select a search model (as described above) and apply the search model to the obtained search results. The searchresults presentation generator 620 generates a search results presentation, typically including one or more search results pages, for presentation to the requesting computer user in response to the search query. - Those skilled in the art will appreciate that the various components of the
search engine 110 ofFIG. 6 described above may be implemented as executable software modules within the computer systems, as hardware modules (including SoCs—system on a chip), or a combination of the two. Moreover, each of the various components may be implemented as an independent, cooperative process or device, operating in conjunction with one or more computer systems. It should be further appreciated, of course, that the various components described above in regard to thesearch engine 110 should be viewed as logical components for carrying out the various described functions. As those skilled in the art appreciate, logical components (or subsystems) may or may not correspond directly, in a one-to-one manner, to actual, discrete components. In an actual embodiment, the various components of each computer system may be combined together or broke up across multiple actual components and/or implemented as cooperative processes on acomputer network 108. - In addition to operating on a
search engine 110, aspects of the disclosed subject matter may be implemented on other computing devices and/or distributed on multiple computing devices, including a computer user's device. For example, according to various embodiments at least some highly relevant content to a search request may be hosted on a site that is access-protected, i.e., the content is available to the computer user when he/she is authenticated and/or maintains an open log-in status with the site, but the content is otherwise restricted to others. In response to a search request from the computer user, a search engine (or other service) may indirectly obtain related entity data from this access-restricted site by way of the computer user's device; the computer user's device (e.g., upon which the computer user maintains a current logged in status with the site) accesses related entity data on behalf of the search service. Indeed, in various embodiments, one or more components on the computer user's device obtain data corresponding to others from the access restricted sites in anticipation of a search request. - While much of the disclosed subject matter has be made in regard to a computer user taking an active role in obtaining content relating to a particular person, aspects of the disclosed subject matter may be suitably and advantageously applied to auto-generation of content relating to people. For example, various search queries regarding one or more persons (expanded search queries) may be made such that the “latest” content on the Internet regarding that person (or persons) may already be available when requested. Yet another example would be to set up an environment such that a user may be notified when a new image/video/news story of that user occurs on the Internet. Of course, aspects of the disclosed subject matter may be applied to topics or entities other than people. For example, an auto-generation page may be set up to display the latest regarding rock climbing, the Supreme Court, and the like.
- While various novel aspects of the disclosed subject matter have been described, it should be appreciated that these aspects are exemplary and should not be construed as limiting. Variations and alterations to the various aspects may be made without departing from the scope of the disclosed subject matter.
Claims (20)
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/931,922 US20150006520A1 (en) | 2013-06-10 | 2013-06-29 | Person Search Utilizing Entity Expansion |
US14/039,259 US20150095319A1 (en) | 2013-06-10 | 2013-09-27 | Query Expansion, Filtering and Ranking for Improved Semantic Search Results Utilizing Knowledge Graphs |
CN201480037264.2A CN105493082A (en) | 2013-06-29 | 2014-06-24 | Person search utilizing entity expansion |
EP14740077.4A EP3014486A1 (en) | 2013-06-29 | 2014-06-24 | Person search utilizing entity expansion |
KR1020157036770A KR20160026907A (en) | 2013-06-29 | 2014-06-24 | Person search utilizing entity expansion |
PCT/US2014/043750 WO2014209925A1 (en) | 2013-06-29 | 2014-06-24 | Person search utilizing entity expansion |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/913,835 US9646062B2 (en) | 2013-06-10 | 2013-06-10 | News results through query expansion |
US13/931,922 US20150006520A1 (en) | 2013-06-10 | 2013-06-29 | Person Search Utilizing Entity Expansion |
Publications (1)
Publication Number | Publication Date |
---|---|
US20150006520A1 true US20150006520A1 (en) | 2015-01-01 |
Family
ID=51168354
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/913,835 Active US9646062B2 (en) | 2013-06-10 | 2013-06-10 | News results through query expansion |
US13/931,922 Abandoned US20150006520A1 (en) | 2013-06-10 | 2013-06-29 | Person Search Utilizing Entity Expansion |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/913,835 Active US9646062B2 (en) | 2013-06-10 | 2013-06-10 | News results through query expansion |
Country Status (5)
Country | Link |
---|---|
US (2) | US9646062B2 (en) |
EP (1) | EP3008645A1 (en) |
CN (1) | CN105339933B (en) |
TW (1) | TW201511547A (en) |
WO (1) | WO2014200780A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180067940A1 (en) * | 2016-09-06 | 2018-03-08 | Kakao Corp. | Search method and apparatus |
US10268758B2 (en) * | 2013-09-29 | 2019-04-23 | Peking University Founder Group Co. Ltd. | Method and system of acquiring semantic information, keyword expansion and keyword search thereof |
TWI668579B (en) * | 2017-11-10 | 2019-08-11 | 全球華人股份有限公司 | Establishing method for the post job description database |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9305108B2 (en) | 2011-10-05 | 2016-04-05 | Google Inc. | Semantic selection and purpose facilitation |
US9652556B2 (en) | 2011-10-05 | 2017-05-16 | Google Inc. | Search suggestions based on viewport content |
US10013152B2 (en) | 2011-10-05 | 2018-07-03 | Google Llc | Content selection disambiguation |
US9646062B2 (en) * | 2013-06-10 | 2017-05-09 | Microsoft Technology Licensing, Llc | News results through query expansion |
US9652499B1 (en) * | 2013-08-21 | 2017-05-16 | Athena Ann Smyros | Search-based recommendation engine |
US10114897B1 (en) * | 2014-12-24 | 2018-10-30 | Open Invention Network Llc | Search and notification procedures based on user history information |
CN106503014B (en) * | 2015-09-08 | 2020-08-07 | 腾讯科技(深圳)有限公司 | Real-time information recommendation method, device and system |
US10372883B2 (en) | 2016-06-24 | 2019-08-06 | Scripps Networks Interactive, Inc. | Satellite and central asset registry systems and methods and rights management systems |
US11868445B2 (en) | 2016-06-24 | 2024-01-09 | Discovery Communications, Llc | Systems and methods for federated searches of assets in disparate dam repositories |
US10452714B2 (en) | 2016-06-24 | 2019-10-22 | Scripps Networks Interactive, Inc. | Central asset registry system and method |
US10762146B2 (en) * | 2017-07-26 | 2020-09-01 | Google Llc | Content selection and presentation of electronic content |
CN110472021A (en) * | 2018-05-11 | 2019-11-19 | 微软技术许可有限责任公司 | Recommend the technology of news in session |
US11875124B2 (en) * | 2021-02-08 | 2024-01-16 | Acto Technologies Inc. | Virtual assistant for a pharmaceutical article |
Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3919528A (en) * | 1972-06-30 | 1975-11-11 | Notifier Co | Method and apparatus for operating authorization control systems |
US20030212666A1 (en) * | 2002-05-10 | 2003-11-13 | Sankar Basu | Adaptive probabilistic query expansion |
US20050210024A1 (en) * | 2004-03-22 | 2005-09-22 | Microsoft Corporation | Search system using user behavior data |
US20060020593A1 (en) * | 2004-06-25 | 2006-01-26 | Mark Ramsaier | Dynamic search processor |
US20060074883A1 (en) * | 2004-10-05 | 2006-04-06 | Microsoft Corporation | Systems, methods, and interfaces for providing personalized search and information access |
US7076437B1 (en) * | 1999-10-29 | 2006-07-11 | Victor Levy | Process for consumer-directed diagnostic and health care information |
US20090094234A1 (en) * | 2007-10-05 | 2009-04-09 | Fujitsu Limited | Implementing an expanded search and providing expanded search results |
US20100094835A1 (en) * | 2008-10-15 | 2010-04-15 | Yumao Lu | Automatic query concepts identification and drifting for web search |
US7827125B1 (en) * | 2006-06-01 | 2010-11-02 | Trovix, Inc. | Learning based on feedback for contextual personalized information retrieval |
US20110119243A1 (en) * | 2009-10-30 | 2011-05-19 | Evri Inc. | Keyword-based search engine results using enhanced query strategies |
US20110125764A1 (en) * | 2009-11-26 | 2011-05-26 | International Business Machines Corporation | Method and system for improved query expansion in faceted search |
US20110191364A1 (en) * | 2010-02-03 | 2011-08-04 | Google Inc. | Information search system with real-time feedback |
US20120323877A1 (en) * | 2011-06-17 | 2012-12-20 | Microsoft Corporation | Enriched Search Features Based In Part On Discovering People-Centric Search Intent |
US20130173655A1 (en) * | 2012-01-04 | 2013-07-04 | International Business Machines Corporation | Selective fetching of search results |
US20130238594A1 (en) * | 2012-02-22 | 2013-09-12 | Peter Jin Hong | Related Entities |
US20140181070A1 (en) * | 2012-12-21 | 2014-06-26 | Microsoft Corporation | People searches using images |
US20140214840A1 (en) * | 2010-11-29 | 2014-07-31 | Google Inc. | Name Disambiguation Using Context Terms |
US20140278400A1 (en) * | 2013-03-12 | 2014-09-18 | Microsoft Corporation | Search Results Using Intonation Nuances |
US20140365468A1 (en) * | 2013-06-10 | 2014-12-11 | Microsoft Corporation | News Results through Query Expansion |
Family Cites Families (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ATE389018T1 (en) * | 1996-01-24 | 2008-03-15 | Schering Corp | CX3C MAMMAL CHEMOKINE GENES |
US6038560A (en) | 1997-05-21 | 2000-03-14 | Oracle Corporation | Concept knowledge base search and retrieval system |
US6996572B1 (en) | 1997-10-08 | 2006-02-07 | International Business Machines Corporation | Method and system for filtering of information entities |
US5953718A (en) | 1997-11-12 | 1999-09-14 | Oracle Corporation | Research mode for a knowledge base search and retrieval system |
US20030220913A1 (en) * | 2002-05-24 | 2003-11-27 | International Business Machines Corporation | Techniques for personalized and adaptive search services |
EP1510938B1 (en) | 2003-08-29 | 2014-06-18 | Sap Ag | A method of providing a visualisation graph on a computer and a computer for providing a visualisation graph |
US7536382B2 (en) | 2004-03-31 | 2009-05-19 | Google Inc. | Query rewriting with entity detection |
US20070005654A1 (en) * | 2005-05-20 | 2007-01-04 | Avichai Schachar | Systems and methods for analyzing relationships between entities |
US7685201B2 (en) | 2006-09-08 | 2010-03-23 | Microsoft Corporation | Person disambiguation using name entity extraction-based clustering |
US7958104B2 (en) | 2007-03-08 | 2011-06-07 | O'donnell Shawn C | Context based data searching |
US20080306914A1 (en) * | 2007-06-05 | 2008-12-11 | Search Capital Ltd | Method and system for performing a search |
US8594996B2 (en) | 2007-10-17 | 2013-11-26 | Evri Inc. | NLP-based entity recognition and disambiguation |
US8862622B2 (en) * | 2007-12-10 | 2014-10-14 | Sprylogics International Corp. | Analysis, inference, and visualization of social networks |
US20100094846A1 (en) | 2008-10-14 | 2010-04-15 | Omid Rouhani-Kalleh | Leveraging an Informational Resource for Doing Disambiguation |
US8055675B2 (en) | 2008-12-05 | 2011-11-08 | Yahoo! Inc. | System and method for context based query augmentation |
US8458171B2 (en) * | 2009-01-30 | 2013-06-04 | Google Inc. | Identifying query aspects |
KR101078864B1 (en) | 2009-03-26 | 2011-11-02 | 한국과학기술원 | The query/document topic category transition analysis system and method and the query expansion based information retrieval system and method |
US8397253B2 (en) | 2009-07-23 | 2013-03-12 | Fmr Llc | Inserting personalized information into digital content |
US8314798B2 (en) | 2009-10-02 | 2012-11-20 | Business Objects Software Limited | Dynamic generation of contextual charts based on personalized visualization preferences |
US20110106807A1 (en) | 2009-10-30 | 2011-05-05 | Janya, Inc | Systems and methods for information integration through context-based entity disambiguation |
US8346795B2 (en) | 2010-03-10 | 2013-01-01 | Xerox Corporation | System and method for guiding entity-based searching |
US8751305B2 (en) * | 2010-05-24 | 2014-06-10 | 140 Proof, Inc. | Targeting users based on persona data |
US8326861B1 (en) | 2010-06-23 | 2012-12-04 | Google Inc. | Personalized term importance evaluation in queries |
US8600979B2 (en) * | 2010-06-28 | 2013-12-03 | Yahoo! Inc. | Infinite browse |
US20120016642A1 (en) | 2010-07-14 | 2012-01-19 | Yahoo! Inc. | Contextual-bandit approach to personalized news article recommendation |
US8386457B2 (en) | 2011-06-22 | 2013-02-26 | International Business Machines Corporation | Using a dynamically-generated content-level newsworthiness rating to provide content recommendations |
US20130060769A1 (en) | 2011-09-01 | 2013-03-07 | Oren Pereg | System and method for identifying social media interactions |
US9665643B2 (en) | 2011-12-30 | 2017-05-30 | Microsoft Technology Licensing, Llc | Knowledge-based entity detection and disambiguation |
US10984337B2 (en) | 2012-02-29 | 2021-04-20 | Microsoft Technology Licensing, Llc | Context-based search query formation |
US20140280179A1 (en) | 2013-03-15 | 2014-09-18 | Advanced Search Laboratories, lnc. | System and Apparatus for Information Retrieval |
-
2013
- 2013-06-10 US US13/913,835 patent/US9646062B2/en active Active
- 2013-06-29 US US13/931,922 patent/US20150006520A1/en not_active Abandoned
-
2014
- 2014-05-22 TW TW103117926A patent/TW201511547A/en unknown
- 2014-06-05 EP EP14737371.6A patent/EP3008645A1/en not_active Withdrawn
- 2014-06-05 CN CN201480033157.2A patent/CN105339933B/en active Active
- 2014-06-05 WO PCT/US2014/040970 patent/WO2014200780A1/en active Application Filing
Patent Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3919528A (en) * | 1972-06-30 | 1975-11-11 | Notifier Co | Method and apparatus for operating authorization control systems |
US7076437B1 (en) * | 1999-10-29 | 2006-07-11 | Victor Levy | Process for consumer-directed diagnostic and health care information |
US20030212666A1 (en) * | 2002-05-10 | 2003-11-13 | Sankar Basu | Adaptive probabilistic query expansion |
US20050210024A1 (en) * | 2004-03-22 | 2005-09-22 | Microsoft Corporation | Search system using user behavior data |
US20060020593A1 (en) * | 2004-06-25 | 2006-01-26 | Mark Ramsaier | Dynamic search processor |
US20060074883A1 (en) * | 2004-10-05 | 2006-04-06 | Microsoft Corporation | Systems, methods, and interfaces for providing personalized search and information access |
US7827125B1 (en) * | 2006-06-01 | 2010-11-02 | Trovix, Inc. | Learning based on feedback for contextual personalized information retrieval |
US20090094234A1 (en) * | 2007-10-05 | 2009-04-09 | Fujitsu Limited | Implementing an expanded search and providing expanded search results |
US20100094835A1 (en) * | 2008-10-15 | 2010-04-15 | Yumao Lu | Automatic query concepts identification and drifting for web search |
US20110119243A1 (en) * | 2009-10-30 | 2011-05-19 | Evri Inc. | Keyword-based search engine results using enhanced query strategies |
US20110125764A1 (en) * | 2009-11-26 | 2011-05-26 | International Business Machines Corporation | Method and system for improved query expansion in faceted search |
US20110191364A1 (en) * | 2010-02-03 | 2011-08-04 | Google Inc. | Information search system with real-time feedback |
US20140214840A1 (en) * | 2010-11-29 | 2014-07-31 | Google Inc. | Name Disambiguation Using Context Terms |
US20120323877A1 (en) * | 2011-06-17 | 2012-12-20 | Microsoft Corporation | Enriched Search Features Based In Part On Discovering People-Centric Search Intent |
US20130173655A1 (en) * | 2012-01-04 | 2013-07-04 | International Business Machines Corporation | Selective fetching of search results |
US20130238594A1 (en) * | 2012-02-22 | 2013-09-12 | Peter Jin Hong | Related Entities |
US20140181070A1 (en) * | 2012-12-21 | 2014-06-26 | Microsoft Corporation | People searches using images |
US20140278400A1 (en) * | 2013-03-12 | 2014-09-18 | Microsoft Corporation | Search Results Using Intonation Nuances |
US20140365468A1 (en) * | 2013-06-10 | 2014-12-11 | Microsoft Corporation | News Results through Query Expansion |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10268758B2 (en) * | 2013-09-29 | 2019-04-23 | Peking University Founder Group Co. Ltd. | Method and system of acquiring semantic information, keyword expansion and keyword search thereof |
US20180067940A1 (en) * | 2016-09-06 | 2018-03-08 | Kakao Corp. | Search method and apparatus |
US11080323B2 (en) * | 2016-09-06 | 2021-08-03 | Kakao Enterprise Corp | Search method and apparatus |
TWI668579B (en) * | 2017-11-10 | 2019-08-11 | 全球華人股份有限公司 | Establishing method for the post job description database |
Also Published As
Publication number | Publication date |
---|---|
EP3008645A1 (en) | 2016-04-20 |
CN105339933A (en) | 2016-02-17 |
TW201511547A (en) | 2015-03-16 |
CN105339933B (en) | 2019-08-06 |
US20140365468A1 (en) | 2014-12-11 |
US9646062B2 (en) | 2017-05-09 |
WO2014200780A1 (en) | 2014-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20150006520A1 (en) | Person Search Utilizing Entity Expansion | |
US20150095319A1 (en) | Query Expansion, Filtering and Ranking for Improved Semantic Search Results Utilizing Knowledge Graphs | |
EP3014486A1 (en) | Person search utilizing entity expansion | |
Olteanu et al. | Distilling the outcomes of personal experiences: A propensity-scored analysis of social media | |
US20180046518A1 (en) | Methods and systems supporting a resource environment for contextual purpose computing | |
JP5230751B2 (en) | A recommendation system using social behavior analysis and vocabulary classification | |
US9218481B2 (en) | Managing password strength | |
US9311406B2 (en) | Discovering trending content of a domain | |
US9081953B2 (en) | Defense against search engine tracking | |
US10606874B2 (en) | Adjusting search results based on user skill and category information | |
CN102792300A (en) | User role based customizable semantic search | |
KR20060050484A (en) | Method, system, and apparatus for receiving and responding to knowledge interchange queries | |
US10685073B1 (en) | Selecting textual representations for entity attribute values | |
US20170235887A1 (en) | Cognitive Mapping and Validation of Medical Codes Across Medical Systems | |
US20150006537A1 (en) | Aggregating Question Threads | |
US10885132B2 (en) | System and method for web search obfuscation using emulated user profiles | |
US7797311B2 (en) | Organizing scenario-related information and controlling access thereto | |
US20160239502A1 (en) | Location-Activity Recommendations | |
US10169711B1 (en) | Generalized engine for predicting actions | |
US20140090049A1 (en) | Context-based database security | |
US20180329909A1 (en) | Instructional content query response | |
US20180121553A1 (en) | System and Method for Monitoring User Searches to Obfuscate Web Searches By Using Emulated User Profiles | |
Yu et al. | Hide-n-seek: An intent-aware privacy protection plugin for personalized web search | |
Yu et al. | The Semantic Network Model of creativity: Analysis of online social media data | |
Zong et al. | Discovering expansion entities for keyword-based entity search in linked data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICROSOFT CORPORATION, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ORMONT, JUSTIN;DAVIS, MARC ELIOT;SIGNING DATES FROM 20130806 TO 20130909;REEL/FRAME:031186/0768 |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034747/0417 Effective date: 20141014 Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:039025/0454 Effective date: 20141014 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |