US20100299360A1 - Extrapolation of item attributes based on detected associations between the items - Google Patents

Extrapolation of item attributes based on detected associations between the items Download PDF

Info

Publication number
US20100299360A1
US20100299360A1 US12/835,125 US83512510A US2010299360A1 US 20100299360 A1 US20100299360 A1 US 20100299360A1 US 83512510 A US83512510 A US 83512510A US 2010299360 A1 US2010299360 A1 US 2010299360A1
Authority
US
United States
Prior art keywords
items
item
associations
association
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/835,125
Inventor
Jin Y. Yi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US12/835,125 priority Critical patent/US20100299360A1/en
Publication of US20100299360A1 publication Critical patent/US20100299360A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation
    • G06Q30/0625Directed, with specific intent or strategy
    • G06Q30/0629Directed, with specific intent or strategy for generating comparisons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Definitions

  • the present invention relates to data mining methods for discovering and quantifying associations between selectable items, and associations between search queries (or other forms of user input) and selectable items.
  • the selectable items may, for example, be products represented in an electronic catalog, documents, web pages, web sites, media files, and/or other types of items for which behavioral associations can be detected.
  • a variety of methods are known for detecting behavior-based associations (i.e., associations based on user behaviors) between items stored or represented in a database.
  • behavior-based associations i.e., associations based on user behaviors
  • the purchase histories or item viewing histories of users can be analyzed to detect behavior-based associations between particular items represented in an electronic catalog (e.g., items A and B are related because a relatively large number of those who purchased A also purchased B).
  • the web browsing histories of users can be analyzed to identify behavior-based associations between particular web sites and/or web pages. See, e.g., U.S. Pat. No. 6,691,163 and U.S. Pat. Pub. 2002/0198882.
  • the detected behavior-based associations are typically used to assist users in locating items of interest. For example, in the context of an electronic catalog, when a user accesses and item's detail page, the detail page may be supplemented with a list of related items. This list may, for example, be preceded with a descriptive message such as “people who bought this item also bought the following,” or “people who viewed this item also viewed the following.”
  • the detected associations may also be used to generate personalized recommendations that are based on the target user's purchase history, item viewing history, or other item selections.
  • the detected associations may be used to rank search result items for display, and/or to supplement a search result set with items that do not match the user's search query. For example, when a user conducts a search, the matching items having the strongest behavior-based associations with the submitted search query may be elevated to a more prominent position in the search results listing; in addition, one or more items that do not match the search query, but which have strong behavior-based associations with the search query, may be added to the search result listing. See, e.g., U.S. Pat. No. 6,185,558.
  • behavior-based associations One problem with relying on behavior-based associations is that the quantity of behavioral data collected for a particular item may be insufficient to create behavior-based associations for that item. This may be the case when, for example, new items are added to an electronic catalog, or when new web pages or documents are added to a data repository.
  • the problem is self perpetuating because popular items (items with behavioral associations) typically remain popular due to their heightened exposure, while new and generally unknown items remain unpopular due to their lack of exposure. This problem is sometimes referred to as the “cold-start” problem.
  • One possible way to reduce the cold-start problem is to supplement the behavior-based associations with content-based associations between items. For example, a new item (one for which little or no behavioral data exists) can be associated with other items based on similarities between the attributes or other content of the items. These content-based associations may then be used to increase the new item's exposure in the same way behavior-based associations are used.
  • content-based associations tend to be less reliable than behavior-based associations, especially if the item content is not highly consistent in format.
  • content-based associations frequently are not a good predictor of the items users desire to purchase, view or otherwise select in combination, and thus tend to be less useful.
  • the detail page for a particular product e.g., a printer
  • the detail page for a particular product may desirably list products that are very different from, but complementary of, that product, such as commonly purchased accessories for the product (e.g., an ink cartridge for the printer). If content-based associations were used in place of the behavior-based associations, however, these complementary products likely would not appear since their attributes would typically be dissimilar to those of the featured product.
  • the present invention comprises computer-implemented systems and methods for extrapolating behavior-based associations to “behavior-deficient” items (generally items for which the collected user activity data of a particular type is insufficient to create meaningful or reliable behavior-based associations).
  • the behavior-based associations are extrapolated based on “substitutability” associations between the behavior-deficient items and other items. These substitutability associations may be based on the attributes or content of the items, in which case they are referred to as content-based associations.
  • the items may, for example, be products represented in an electronic catalog, web pages or other documents accessible on a network, or web sites. More generally, the items can be any type of item for which user behaviors (e.g., purchases, accesses, downloads, etc.) can be monitored and analyzed to detect behavior-based associations, and for which suitable substitutability associations may be detected.
  • the behavior-based associations that are extrapolated are associations between selectable items. For example, suppose that item A is behaviorally associated with items B and C because, for example, users who select A also frequently select B, and/or C. Suppose further that item A has a content-based association with item X (e.g., because many of the attributes of A and X are the same), and that item X is a behavior-deficient item (e.g., because it is new or unpopular).
  • item A's behavior-based associations with B and C may be extrapolated to, or “inherited by,” item X such that new associations are created between X and B and between X and C. Note that X may be dissimilar in content to both A and C in this example, such that no associations would be created between X and B and between X and C if the associations were based solely on item content.
  • the strengths of these newly created associations may be dependent upon both (a) the degree to which items A and X are similar in content, and (b) the strengths of the behavior-based associations between A and B and between A and C, respectively.
  • the strengths of the new associations may also depend on whether X is similar in content to any other items that have a behavior-based association with B and/or C.
  • the newly created associations may, but need not, be terminated or phased out as sufficient user activity data becomes available for creating behavior-based associations between X and other items.
  • the behavior-based associations that are extrapolated to behavior-deficient items are associations between search queries and selectable items. These query-item associations are used to rank search results for display, and/or to supplement search results with additional items that do not match the search query. For example, suppose that search query Q is behaviorally associated with item A because, for example, users who submit Q frequently select item A from the search results listing. Suppose further that a new and thus behavior-deficient item, item B, is introduced into the search space, and that item B is similar in content to, and thus substitutable with, item A. In accordance with the invention, a new association may automatically be created between Q and item B. This new association may cause item B to be displayed at a more prominent position in the search results listing for Q, and if item B does not match Q, may cause item B to be added to the search result listing for Q.
  • the invention may also be used to extrapolate other types of associations to behavior-deficient items. For example, a strong behavior-based association may exist between a particular ad and a particular web page based on the relatively high click-through rate experienced when the ad is displayed on this page. When a new web page (potentially on an entirely different web site) becomes available for purposes of displaying ads, this new page may inherit the behavior-based association with the ad, causing the ad to be selected (or selected more frequently than otherwise) for display on the new page.
  • the invention also comprises a computer-implemented method of extrapolating item attributes.
  • the method comprises: identifying a first item that has a first attribute, and a second item that is not known to have said first attribute; and determining a strength of a substitution association between the first and second items.
  • the strength of the substitution association is based at least partly on an automated analysis of content of the first and second items.
  • the method further comprises extrapolating the first attribute to the second item based on the strength of the substitution association.
  • FIG. 1 illustrates a web site system according to one embodiment of the invention.
  • FIG. 2 is a flow chart illustrating one embodiment of a process for creating new item associations using content-based and behavior-based associations between items.
  • FIG. 3A is a graph depicting behavior-based associations between four items in an electronic catalog.
  • FIG. 3B is a graph depicting example content-based associations between the items of FIG. 3A .
  • FIG. 3C illustrates how the behavioral and content-based associations of FIGS. 3A and 3B may be used in combination to create new associations between items.
  • FIG. 3D illustrates how the behavioral and content-based associations of FIGS. 3A and 3B may be used in combination to create new associations for a newly added pocketed red polo shirt.
  • FIG. 4 illustrates an embodiment in which the new associations are created between search queries and search results.
  • FIG. 5A is a graph depicting behavior-based associations between a search query and items (web pages) in a search space.
  • FIG. 5B is a graph depicting content-based associations between the items in FIG. 5A and three newly added items.
  • FIG. 5C illustrates how the behavioral and content-based associations of FIGS. 5A and 5B may be used in combination to create new associations between the search query and particular items.
  • FIG. 1 illustrates an embodiment in which the invention is employed for purposes of detecting associations between items represented in a browsable electronic catalog of items.
  • the detected associations between items may be used for various purposes, such as to supplement item detail pages with lists of related items, and/or to generate personalized recommendations for particular users. See, e.g., U.S. Pat. No. 6,912,505, the disclosure of which is hereby incorporated by reference.
  • the electronic catalog in this embodiment contains item content supplied by many different entities.
  • some of the item content may be supplied by a variety of different marketplace sellers, as described in U.S. Pub. 2003/0200156 A1, the disclosure of which is hereby incorporated by reference.
  • the catalog data lacks a sufficient degree of uniformity or consistency to reliably detect content-based associations between items. Consequently, behavior-based associations (those based on collected user activity or “behavioral” data, such as users' purchase histories, rental histories, detail page viewing histories, download histories, etc.) are generally more reliable than content-based associations. Behavior-based associations may be preferred over content-based associations for other reasons as well, depending on how the detected associations are used.
  • the quantity of behavioral data collected for a given item may, in many cases, be insufficient to reliably detect behavior-based associations between that item and any other items. This may be the case where, for example, an item was only recently added to the electronic catalog, or is relatively unpopular. Rather than merely relying on content-based associations for such items, the present embodiment uses a combination of content mining and behavioral mining to create new associations for these items. This is accomplished by using content-based associations, or alternatively another type of “substitutability” association (i.e., an association that represents or is based on a degree to which particular items are substitutable with each other), to effectively extrapolate behavior-based associations to the new or unpopular items.
  • content-based associations or alternatively another type of “substitutability” association (i.e., an association that represents or is based on a degree to which particular items are substitutable with each other), to effectively extrapolate behavior-based associations to the new or unpopular items.
  • a behavior-based association exists between items A and B, and that item C is a new item for which little or no behavioral data exists (i.e., it is a behavior-deficient item).
  • items B and C are very similar in content, as determined, for example, by comparing their respective attributes (e.g., name, category, author, subject, description, manufacturer, price, etc.).
  • the present embodiment effectively extrapolates or extends the B's association with A to item C, such that C effectively inherits a behavior-based association with A. (If B has behavior-based associations with other items, C may inherit those as well.)
  • This new association between A and C may be referred to as an extrapolated or inherited association.
  • the strength of this new association between items A and C depends upon both the strength of the A-B behavioral-based association and the strength of the B-C content-based or other substitutability association.
  • the strength of the A-C association also preferably depends on whether A and C are associated through any other “paths.” For instance, the association between A and C will be stronger if A also has a behavior-based association with D, and D has a content-based association with C.
  • the extrapolated relationships created between item C and other items is may, but need not, be phased out or terminated in favor of pure behavior-based associations. There is benefit to keep applying the process of extrapolating associations even when enough signal is present for pure behavioral relationships.
  • the extrapolated associations are generated by taking the “nearest-neighborhood” of substitutable items for any given item in aggregate.
  • Common behavioral associations within the nearest neighborhood would be boosted due to this aggregated treatment.
  • the star-guide map may be common to all the telescopes, so the guide's weight would get boosted in the aggregate. This behavior has been empirically shown to help reduce the erroneous associations from noisy behavioral information.
  • a web server system 30 includes a web server 32 that generates and serves pages of a host web site to computing devices 34 of end users.
  • the computing devices 34 may include a variety of other types of devices, such as cellular telephones and Personal Digital Assistants (PDAs).
  • PDAs Personal Digital Assistants
  • the web server 32 may be implemented as a single physical server or a collection of physical servers.
  • the invention may alternatively be embodied in another type of multi-user interactive system, such as an interactive television system, an online services network, or a telephone-based system in which users select items to acquire via telephone keypad entries and/or voice.
  • the web server 32 provides user access to an electronic catalog of items represented within a database 36 or a collection of databases.
  • the items represented in the database 36 may include or consist of items that may be purchased, rented, licensed, downloaded, or otherwise acquired via the web site (e.g., consumer electronics products; household appliances; book, music and video titles in physical and/or downloadable form; magazine subscriptions, computer programs, documents, etc.).
  • the items may consist primarily or exclusively of physical products that are shipped to users, and/or of digital products that are delivered over a network. Many hundreds of millions of different items may be represented in the database 36 .
  • the catalog data stored for a given item in the database 36 typically includes a number of different attributes (e.g., name, manufacturer, author, category, subject, color, browse node, price, etc.), which may be represented as name-value pairs.
  • Different catalog items may have different attributes. As is conventional, the items may be arranged within a hierarchy of browse categories to facilitate navigation of the catalog.
  • the present invention is not limited to items that can be purchased or otherwise acquired from an electronic catalog.
  • the invention may also be employed to derive behavioral relationships between web sites, web pages, business represented in an online business directory, blogs, chat rooms, authors, brands, people (e.g., in the context of a social networking system), and documents stored on a company network.
  • inventive methods described herein can be applied to any type (or types) of item for which both (a) the associated item attributes or content, or some other source of information, permits the detection of items that are highly substitutable, and (b) activity data of users, such as purchase histories, viewing histories, explicit ratings, etc., can be used to detect behavior-based associations.
  • the web server 32 which may include any number of physical servers, runs a page generator component 33 that dynamically generates web pages in response to requests from the user computing devices 34 .
  • the web pages are generated using a repository of web page templates 38 , and using data retrieved from a set of services 35 .
  • the types of services 35 can vary widely, and may include, for example, a catalog service that returns catalog data for particular items, a search service that processes search queries submitted by users, a recommendation service that generates and returns personalized item recommendations for users, and a transaction processing services that processes purchases and/or other types of transactions.
  • users of the web site can obtain detailed information about each item by accessing the item's detail page within the electronic catalog.
  • Each item detail page may be located by, for example, conducting a search for the item via a search engine of the web site, or by selecting the item from a browse tree listing.
  • Each item detail page may provide an option for the user to acquire the item from a retail entity and/or from another user of the system.
  • the web server system 30 and/or the services 35 maintain item selection histories 40 for each user of the web site.
  • the item selection history 40 of each user identifies catalog items selected by that user via the web site, preferably together with the associated dates and times of selection.
  • the item selection histories may, for example, include item purchase histories, item rental histories, item detail page viewing histories, item download histories, or any combination thereof.
  • the item selection histories 40 may include data obtained from external sources, such as the web site systems of business partners, browser toolbars of users, or customer credit card records.
  • Item selection histories 40 of many hundreds of thousands or millions of unique users may be maintained and analyzed by the system 30 .
  • Each user account may be treated as a separate user for purposes of maintaining item selection histories; thus for example, if members of a household share a single account, they may be treated as a single user.
  • a behavior-based association mining component 44 collectively analyzes or “mines” the item selection histories of the users periodically (e.g., once per day) to detect and quantify behavior-based associations between particular catalog items.
  • the methods described in U.S. Pat. No. 6,912,505, referenced above, may be used for this purpose.
  • the behavior-based association mining component 44 generates a table 46 or other data structure that identifies pairs of items for which a behavior-based association has been detected. For each such pair of items, the table 46 also stores a behavioral association strength value or “weight” indicating the strength of the association.
  • the associations may be based on any type or types of recorded user activity, such as purchases, rentals, viewing events, shopping cart adds, and/or downloads.
  • the strength of the association between two items depends on how many unique users who selected one item (for purchase, viewing, etc.) also selected the other. These counts are proportioned against the individual item selection counts. Using the proportions, significance tests or signal processing techniques may be performed to reduce the number of invalid associations due to noise in the data.
  • Each entry in the table 46 may, for example, be in the form of a one-to-many mapping that maps a particular item to a list of the most closely related items, together with associated weights. Behavior-based associations that fall below a selected strength threshold may be excluded from the table 46 .
  • a content-based association mining component 42 that periodically and collectively mines the electronic database of items 36 to detect and quantify content-based associations between particular catalog items.
  • the content-based association mining component 42 generates a content-based associations table 48 that identifies pairs of items that share similar characteristics or content. For each such pair, the table 48 also stores a respective content-based association strength value or weight representing the strength of the content-based association. Each such weight value also generally represents the degrees to which the corresponding items are substitutable or interchangeable with each other.
  • Any of a variety of known methods for comparing item attributes may be used to detect and quantify the content-based associations. Techniques from natural language processing such as simple inter-document term frequency or more complicated algorithms such as latent semantic analysis may be used.
  • pattern recognition techniques such as neural networks or Bayesian belief networks operating over the content feature space may be used.
  • Content-based associations that fall below a selected threshold e.g., 80% similarity if the strengths are in a probabilistic domain
  • a selected threshold e.g., 80% similarity if the strengths are in a probabilistic domain
  • the content-based analysis may be limited to pairs of items in which one of the two items is a “behavior-deficient” item. For example, if item purchases are used to detect the behavior-based associations, an item may be treated as behavior deficient if it has been purchased less than ten times, or if the purchase behaviors of those who have purchased it are insufficiently reliable to associate it with any other item. An item may be behavior deficient if, for example, it has only recently been added to the electronic catalog, or if it is an obscure, high priced, or otherwise unpopular item.
  • the behavior-based and content-based associations tables 46 , 48 are periodically analyzed in combination by an extrapolation component 50 to selectively extrapolate or propagate behavior-based associations to unpopular items, as described above. For example, if the behavior-based table 46 indicates that B is behaviorally associated with C, D and E, and the content-based table 48 indicates that unpopular item U has a content-based association with B, the extrapolation component 50 may create associations between U and C, U and D, and U and E. The extrapolation component 50 thereby effectively augments the behavior-based associations table 46 with these extrapolated associations, particularly for “behavior-deficient” items.
  • This augmented behavioral association data table is depicted in FIG. 1 as element 52 , although the augmented table may actually be created by simply adding new entries to the behavior-based associations table 46 .
  • the task of creating extrapolated associations may, for example, be triggered by the generation of a new behavior-based associations table 46 .
  • the augmented behavioral association data table 52 includes an association weight value for each pair of associated items.
  • the weights are generated based on the corresponding behavior-based and content-based weights, as described below. These extrapolated association weights are preferably normalized with (on the same scale as) the purely behavior-based weights. If the augmented table is simply created by adding new entries to the behavior-based association table, the extrapolated association weights must be in the same domain as the true behavioral-based weights
  • the augmented behavioral association data table 52 may be used for a variety of purposes. For example, when a user accesses an item detail page of an item, the web server 32 may access the augmented behavioral association data table 52 to look up a list of related items, and may incorporate this list into the item detail page. If the item detail page is for an unpopular item, this list or related items will ordinarily be based exclusively on extrapolated associations created for the unpopular item.
  • the augmented behavioral association data table 52 including the association weights stored therein, may also be used to generate personalized recommendations that are based on the item selections of the target user. The methods described in U.S. Pat. No. 6,912,505, referenced above, may be used for this purpose.
  • the augmented behavioral association data table 52 may be used to augment a search results set with one or more items that are closely related to those that match the user's search query.
  • the item associations recorded in the behavior-based table 46 and the augmented behavioral association data table 52 are preferably “directional” associations. For instance, although item A may be mapped to item B, item B is not necessary mapped to item A. Thus, for example, although item B may appear on item A's detail page (as a related item), item A may not appear on item B's detail page. In other embodiments, the associations may be non-directional.
  • FIG. 2 is a flow chart which illustrates one example of the steps that may be performed by the extrapolation component 50 .
  • each item, i, in the catalog is analyzed in sequence.
  • the current item is checked to determine whether it is “popular,” which in the sample flow chart involves determining whether it has any behavior-based associations to any other items.
  • every item is treated as either popular or unpopular, with popularity being based on collected user activity data.
  • any other standard may be used to classify whether or not an item is popular. For example, an item with less than a certain threshold number of behavioral associations may be considered unpopular.
  • step 210 the process skips to the next item and returns to step 202 . If however, item i is unpopular, then item i is selected for further analysis by obtaining each item s that has a content association value with item i greater than a threshold value. In other words, in step 203 an item s that has significant content-based similarity to item i is identified. This significance indicates the items' substitutability or interchangeability.
  • step 204 item s may be further analyzed to whether or not it is substitutable for i.
  • This step allows for additional error-reducing mechanisms which may ensure that s is substitutable for i.
  • a red men's polo shirt may be highly content related (step 203 ) with a red women's polo shirt, but they may not be substitutable (step 204 ), since a man would not wear a woman's shirt, and vice versa. Step 204 thus reduces this possibility for error.
  • one method that may be used to assess whether two items are substitutes for each other is to monitor how frequently they are selected for viewing within common browsing sessions. Item classifiers such as “men” and “women” may also be used to assess substitutability.
  • step 209 the process in step 205 searches to find each item b that has a behavior-based association with s by skipping over items without behavior-based associations (step 208 ).
  • step 206 each item b that does have a behavior-based association with s is checked to determine if that behavior-based association with s is valid.
  • a set of batteries may have a behavior-based association with an electronic device because those batteries are often purchased with the electronic device, but if those batteries are incompatible with the device (which may be determined, e.g., by examining return orders), then the behavior-based association between the batteries and the electronic device may be considered invalid.
  • step 207 the current behavior-based association weight from i to b is assigned the value of the old behavior-based association weight from i to b plus the product of the content-based association weight from i to s multiplied by the behavior-based association weight of s to b. Examples of this value assignment will be discussed below with reference to FIGS. 3A-3D .
  • the use of multiplication to combine the i to s content-based and s to b behavior-based weights and the use of summation to combine this result with the any previous i to b behavior-based weight is only specific to this embodiment.
  • each node represents a respective item in the electronic catalog
  • each edge (shown as an arrow) represents an association between two items.
  • the numbers included in-line with the arrows represent corresponding association weights or strengths, with behavioral association weights (including those created via extrapolation) being on a scale of zero to infinity, and content-based weights being on a scale of zero to one.
  • FIG. 3A illustrates behavioral associations between four items
  • FIG. 3B illustrates the content-based associations between these same items.
  • FIG. 3A has few connections since not every item pair in a catalog may have behavioral data associated with it. Since the graphs of FIGS. 3A and 3B represent different domains, behavioral association and content association, respectively, the edges of each graph represent different kinds of relationships.
  • the behavioral association graph, FIG. 3A shows some measure of intentional relationships.
  • a green polo shirt 301 is shown to have a behavioral association strength of forty to cargo shorts, and of ten to a chronometer watch. This does not necessarily mean that the chronometer watch 303 has a behavioral association strength of ten to the green polo shirt 301 , since the arrow points from the green polo shirt 301 to the chronometer watch 303 , and not vice versa.
  • the users who have selected the green polo shirt 301 have also selected the chronometer watch 303 to create a significant relationship for the shirt, it does not mean that these users create enough significance with respect to the chronometer watch's total selection base.
  • the red polo shirt 302 has no edges because little or no behavioral data exists for the red polo shirt 302 .
  • the content association graph shows some measure of how innately similar items are. Since the content-based relationship is inherent to the pair of items, the directionality of the relationship between the pair of items is mutual, represented by bi-directional arrows.
  • the content association strength or weight between the green polo shirt 301 and red polo shirt 302 has a value of 0.9. A value of 1.0 would mean that two items are identical. Consequently, the association strength value of 0.9 between the green polo shirt and the red polo shirt means that the two items are highly similar in content, which is understandable, since the only difference between the two items is their color.
  • FIG. 3C illustrates how the behavioral and content-based associations of FIGS. 3A and 3B may be used in combination to create extrapolated associations for the unpopular red polo shirt.
  • this example illustrates an item inheriting behavioral association values from a single item
  • the sample equation given also works for an item inheriting behavioral association values from multiple items.
  • FIG. 3A there is no behavior-based association edge from the red polo shirt 302 to the cargo shorts 304 . Consequently, the behavior-based association weight from the red polo shirt 302 to the cargo shirts 304 is zero.
  • Running through the sample process of FIG. 2 would provide behavioral associations for the behavior-deficient red polo shirt 302 inherited from the behavioral associations of the green polo shirt 301 .
  • the value of the new behavior-based association weight from the red polo shirt 302 to the cargo shorts 304 would equal the value of the old behavior-based association weight from the red polo shirt 302 to the cargo shorts 304 (which is zero, since no edge exists) plus the product of the content-based association weight from the red polo shirt 302 to the green polo shirt 301 (0.9 from FIG. 3B ) multiplied by the behavior-based association weight of the green polo shirt 301 to the cargo shorts 304 (which is 40 from FIG. 3A ).
  • this detail page will display the cargo shorts 304 and chronometer watch 303 as related items, even though neither has a pure behavior-based association with the red polo shirt.
  • the decision of whether to display these related items on this detail page may depend on whether the strengths of the newly created associations exceed some threshold, and also on whether other related items exists that have stronger associations with the red polo shirt.
  • the newly created associations may also cause the red polo shirt to show up on the detail pages for the cargo shorts and/or the chronometer watch.
  • red pocketed polo shirt 305 ( FIG. 3D ) is now added to the catalog, and does not yet have any behavioral data (e.g., it has not yet been purchased).
  • the red pocketed polo shirt 305 has a content-based association strength of 0.9 with the red polo shirt 302 , and a content-based association strength value of 0.8 with the green polo shirt 301 , as depicted by the dashed lines in FIG. 3D . If the process of FIG. 2 were applied, the result would be extrapolated associations between the new pocketed red polo shirt 305 with the watch 303 and the shorts 304 , as shown in FIG. 3D .
  • the fact that the red pocketed polo shirt 305 has similar attributes to multiple items that are behaviorally related to the watch 303 serves to increase the strength of the newly created association.
  • the extrapolated associations may optionally be terminated or phased out in favor of behavior-based associations. If a phase-out process is used, the strengths of the extrapolated associations may, for example, be decreased in proportion to the amount of behavioral data collected for the associated items.
  • the extrapolated associations may alternatively be phased out over time regardless of the quantity of behavioral data, such that extrapolated associations are only used for relatively new or newly added items (e.g., those added in the last five days).
  • the content-based association weights generally represent the degree to which particular items are substitutable which each other. This is because items that have similar attributes or content (e.g., two camcorders with similar specifications) tend to be substitutes for each other. Content-based associations are thus one form of substitutability association.
  • substitutability associations may be detected automatically using other sources of information. For example, as described in U.S. Pat. No. 6,912,505, substitutability associations can be detected by mining the session-specific item viewing histories of users, and particularly their session-specific item detail page viewing histories. This is because users tend to comparison shop for a particular type of item when they browse the catalog. Thus, for example, if a relatively large number of users who select item A for viewing also select item B for viewing during the same browsing session, items A and B are likely highly substitutable. In contrast, purchase-based associations tend to reveal items that are complementary of each other. Because catalog items tend to be viewed much more frequently than they are purchased (especially for high priced items), viewing-history-based (substitutability) associations can often be detected for a particular item even though the item is behavior deficient in the purchase domain.
  • the viewing-based (substitutability) association between A and C may be used to create a new (extrapolated) association between B and C.
  • this variation can be implemented by replacing the content-based association mining component 42 with a component that analyzes the session-specific item detail page viewing histories of users.
  • the table 48 would still store substitutability association information (including weight values), but the associations would no longer be based on item content.
  • both content-based mining and item viewing history mining can be used in combination to detect the substitutability associations.
  • the behavior-based association mining component 44 would use purchase histories of users to detect the behavior-based associations represented in table 46 .
  • FIG. 4 illustrates an embodiment in which the invention is employed for purposes of creating new associations between search queries and new or otherwise behavior-deficient items.
  • the items are web pages in a search space, where the search space may, for example, be a particular web site, the Internet, or a corporate intranet.
  • the items could alternatively be products represented in an electronic catalog, blogs, podcasts, business listings in an online directory, other types of documents, or any other type of item for which keyword searches can be performed.
  • the associations between search queries and items in this embodiment may be used for various purposes, such as to rank items in a search result listing, and/or to supplement the search result listing with additional items.
  • the quantity of behavioral data collected for a given page may, in many cases, be insufficient to reliably detect behavior-based associations between that page and particular search queries. This may be the case where, for example, the page is new, such that few users have had the opportunity to click through to it from a search results listing.
  • this lack of behavioral data tends to be self perpetuating, as the lack of such data may cause the page to be displayed in a less prominent position in, or to be completely omitted from, search result listings.
  • the search system 430 in this embodiment includes a search engine 432 which responds to search queries (typically consisting of textual search strings) received over the Internet from users' computing devices 434 .
  • the search engine 432 may be implemented as software running on a single physical server or a collection of physical servers.
  • the search engine 432 provides searchable access to a collection of web pages in a search space 436 , with each web page identified by a unique uniform resource locator (URL).
  • the pages represented in the search space 436 may include or consist of pages hosted by a single source or a wide variety of different sources.
  • the search engine 432 may use a pre-generated search index 435 to identify web pages that match particular search queries.
  • the search system 430 maintains a search activity log 440 containing activity data (behavioral data) descriptive of search activities of users.
  • the stored activity data includes the submitted search queries, and includes identifiers, such as URLs, of the web pages selected by particular users.
  • the search activity log 440 may include data obtained from external sources, such as the search systems of business partners. Search histories of many hundreds of thousands or millions of unique users may be maintained and analyzed by the system 430 .
  • the search activity data may, for example, be stored in a chronological log file, or in a database of the type described in U.S. Pat. Pub. 2005/0033803 A1, the disclosure of which is hereby incorporated by reference.
  • a behavioral association mining component 444 collectively analyzes or “mines” the search activity data 440 periodically (e.g., once per day) to detect and quantify behavior-based associations between search queries and particular web pages. Examples of data mining algorithms that may be used for this purpose are described in U.S. Pat. No. 6,185,558, U.S. Patent Pub. 2005/0222987 A1 and U.S. patent application Ser. No. 11/276,079, filed Feb. 13, 2006, the disclosures of which are hereby incorporated by reference.
  • the behavioral association mining component 444 generates a table 446 or other data structure that identifies detected behavior-based associations between particular search queries and web page URLs.
  • the search queries represented in this table 446 may consist solely of search terms and phrases, or may, in some embodiments, also include other types of constraints such as search-field designations.
  • the table 446 also stores a behavioral association strength value or “weight” indicating the strength of the association.
  • the associations may be based on any type or types of recorded user activity, such as search submissions, selections from search results, and/or previewing a search result. In general, the strength of the association between the pair will depend on how many users who submitted the search query thereafter selected the URL.
  • Each entry in the table 446 may, for example, be in the form of a one-to-many mapping that maps a particular query to a list of the most closely related web pages, together with associated weights. Behavior-based associations that fall below a selected strength threshold may be excluded from the table 446 .
  • a content-based association mining component 442 that periodically analyzes the web pages in the search space 436 to detect and quantify content-based associations between particular pages.
  • the content-based association mining component 442 generates a URL-to-URL association table 448 that identifies pairs of web pages that share similar characteristics or content. For each such pair, the table 448 also stores a respective content-based association strength value or weight. These values generally represent the substitutability of particular pairs of web pages.
  • the table 448 may be generated such that each URL pair consists of a URL of a behavior-deficient web page and a URL of a non-behavior-deficient web page. Content-based associations that fall below a selected threshold (e.g., 80% similarity) may be excluded from the table 448 .
  • a selected threshold e.g., 80% similarity
  • the behavior-based and content-based associations tables 446 , 448 are periodically analyzed in combination by an extrapolation component 450 to selectively extrapolate or propagate behavior-based associations to new or otherwise behavior-deficient web pages, as described above.
  • the extrapolation may be performed using substantially the same process shown in FIG. 2 and described above. For example, if the query-to-URL association table 446 indicates that search query Q is behaviorally associated with page P, and the URL-to-URL association table 448 indicates that behavior-deficient web page D has a content-based association with P, the extrapolation component 450 may create a new, extrapolated association between Q and D.
  • the weights are generated based on the corresponding behavior-based and content-based weights, as described below. These extrapolated association weights may be normalized with (on the same general scale as) the purely behavior-based weights.
  • the augmented query-to-URL table 446 may be used to provide users with “behaviorally integrated” search results that depend on the actions of past users. For example, when a user submits a search query to the search engine 432 , the search system 430 may access the query-to-URL association table 446 to rank the matching web pages for display. The search system may additionally or alternatively augment the list of matching web pages with additional web pages that do not “match” the search query, but which have actual or extrapolated behavioral associations with the search query.
  • FIG. 5A-5C illustrate a simple example of how new query-to-item associations may be formed between the search query “ziggy stardust” and newly added web pages.
  • the search space initially contains three web pages that are potentially related to this search query: a page 501 about the Ziggy Stardust phase of David Bowie's career, a page 503 about Ziggy Marley 503 , and a page 505 about David Bowie but with no occurrences of either “Ziggy” or “Stardust.”
  • a standard (non-behaviorally integrated) search for “Ziggy Stardust” would return the matching Ziggy Stardust page and possibly the partially matching Ziggy Marley page, but would not return the David Bowie page (despite its higher degree of relevance).
  • behavior-based associations exist between the search query “ziggy stardust” and pages 501 , 503 and 505 with strengths of twenty, one and ten, respectively, as depicted in FIG. 5A .
  • the behavior-based association with the non-matching David Bowie page 505 may have been created by detecting that users who searched for “ziggy stardust” often eventually accessed the David Bowie page 505 , even though this page did not shown up in the search results.
  • a behaviorally integrated search query for “ziggy stardust” will desirably return all three pages, 501 , 503 and 505 . These pages may be displayed in the search results listing from highest to lowest behavior-based strength.
  • a new Ziggy Stardust page 502 a new David Bowie page 504 , and a new David Bowie Discography page 506 which includes a timeline mentioning Bowie's Ziggy Stardust phase. Because these pages are new, very little or no behavioral data is associated with them (i.e., they are behavior deficient), at least for the search query “ziggy stardust.” As a result, the new Ziggy Stardust page 502 will likely appear at or near the bottom of the behaviorally integrated search results, and the David Bowie Discography page 506 might also appear at the bottom depending on how well the search engine's text-based parsing performs.
  • the new David Bowie page 504 will not appear in the search results, even though it is relevant to the search query. Further, due to the lack of exposure of these new pages in search results, they will likely rarely be selected, and thus will remain behaviorally deficient. Consequently, these new pages suffer from the cold-start problem.
  • the extrapolation methods described herein alleviate this problem, allowing these new pages 502 - 506 to be immediately displayed in relatively prominent positions in the search results.
  • FIG. 5B illustrates an example set of content-based associations between these newly added web pages 502 - 506 and the pre-existing pages.
  • the weight values are again on a scale of 0 to 1, with a weight of 1 representing the highest detectable level of content similarity (and thus substitutability).
  • the absence of an arrow between two pages means that they are not sufficiently similar in content to detect or form a content-based association, or equivalently, that they have a content-based association of zero.
  • the new Ziggy Stardust page 502 has a content-based association value of 0.8 with the preexisting Ziggy Stardust page 501 , 0.2 with the David Bowie Discography page 506 , and 0 for the rest of the pages.
  • the new David Bowie page 504 has a content-based association value of 0.8 with the preexisting David Bowie page 505 and the David Bowie Discography page 506 , and 0 for all other pages.
  • the David Bowie Discography page 506 has a content-based association with the new David Bowie and Ziggy Stardust pages as previously described and has associations with the preexisting David Bowie and Ziggy Stardust pages with values of 0.8 and 0.2, respectively.
  • the content-based associations between the web pages may be detected and quantified using well known text and linguistic analysis algorithms.
  • FIG. 5C illustrates the set of behavior-based associations (including inherited or “extrapolated” associations) that exist after the new web pages 502 and 504 inherit the behavior-based associations of their respective content-similar neighbors 501 and 505 .
  • the newly added pages 502 , 504 and 506 will desirably be displayed in more prominent positions in the search results than the Ziggy Marley page 503 since they have stronger (inherited) behavior-based associates with this query.
  • the search-based embodiment shown in FIG. 4 may be used in the context of a catalog search engine to assist users in locating items, such as products, in an electronic catalog.
  • the content-based association mining component 442 of FIG. 4 may be replaced by, or used in combination with, a component that assesses item substitutability by analyzing session-specific item viewing histories, as described above.
  • search-based embodiment described above can be extended to include general user input instead of just search queries.
  • a search query is just one type of user input that can be associated with particular items.
  • Other forms of user input include keywords, tags, captions, and discussion items.
  • catalog-based and search-based embodiments described above can also be combined in various ways such that both item-to-item and query-to-item associations are extrapolated to behavior-deficient items.
  • inventive methods described herein can also be used to extrapolate other types of behavior-based associations to behavior-deficient items.
  • the system may detect behavior-based associations between particular ads and particular web pages. These associations may be based on ad click-through rates (e.g., ad A is associated with page P because a relatively large number of those who have viewed page P with ad A have clicked on ad A), and may be used by the ad server system to dynamically select ads for display.
  • the page when a new web page becomes available for purposes of displaying ads, the page may initially be matched to one or more other web pages (potentially of other web sites) based on content similarities.
  • the new (behavior-deficient) web page may then inherit the ad-to-page associations of these content-similar web pages, increasing the likelihood that particular ads will be selected for display on the new page.
  • behavior-based associations between particular ads and particular web sites may be extrapolated to new web sites.
  • U.S. application Ser. No. 10/766,368, filed Jan. 28, 2004, the disclosure of which is hereby incorporated by reference herein, discloses methods for detecting behavior-based associations between particular catalog items (e.g., products available for purchase) and particular web sites.
  • particular catalog items e.g., products available for purchase
  • the disclosed extrapolation methods may be used to create new associations between particular catalog items and the new web site.
  • These newly created associations may, for example, be used to select catalog items to recommend to users who visit the new web site, and/or to suggest web sites to users who view or purchase particular products.
  • All of the methods and processes described above may be embodied in, and fully automated via, software code modules executed by one or more general purpose computers.
  • the code modules may be stored in any type of computer-readable medium or other computer storage device. Some or all of the methods may alternatively be embodied in specialized computer hardware.
  • the behavioral data and association tables may be stored in any type of computer data repository, such as relational databases and flat files systems that use magnetic disk storage and/or solid state RAM.

Abstract

An attribute of a first item is extrapolated to a second item that is not known to have that attribute. The extrapolation occurs as a result of a substitution association detected between the first and second items. The substitution association may be detected based on an analysis of the content of the first and second items. The extrapolated attribute may be a behavioral association with a third item, in which case an inference is drawn that the second and third items are behaviorally related. The items may, for example, be products represented in an electronic catalog.

Description

    PRIORITY CLAIM
  • This application is a division of U.S. application Ser. No. 11/424,730, filed Jun. 16, 2006, the disclosure of which is hereby incorporated herein by reference.
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates to data mining methods for discovering and quantifying associations between selectable items, and associations between search queries (or other forms of user input) and selectable items. The selectable items may, for example, be products represented in an electronic catalog, documents, web pages, web sites, media files, and/or other types of items for which behavioral associations can be detected.
  • 2. Description of the Related Art
  • A variety of methods are known for detecting behavior-based associations (i.e., associations based on user behaviors) between items stored or represented in a database. For example, the purchase histories or item viewing histories of users can be analyzed to detect behavior-based associations between particular items represented in an electronic catalog (e.g., items A and B are related because a relatively large number of those who purchased A also purchased B). See, e.g., U.S. Pat. No. 6,912,505. As another example, the web browsing histories of users can be analyzed to identify behavior-based associations between particular web sites and/or web pages. See, e.g., U.S. Pat. No. 6,691,163 and U.S. Pat. Pub. 2002/0198882.
  • The detected behavior-based associations are typically used to assist users in locating items of interest. For example, in the context of an electronic catalog, when a user accesses and item's detail page, the detail page may be supplemented with a list of related items. This list may, for example, be preceded with a descriptive message such as “people who bought this item also bought the following,” or “people who viewed this item also viewed the following.” The detected associations may also be used to generate personalized recommendations that are based on the target user's purchase history, item viewing history, or other item selections.
  • It is also known in the art to analyze the search behaviors of users to detect associations between particular search queries and particular items. The detected associations may be used to rank search result items for display, and/or to supplement a search result set with items that do not match the user's search query. For example, when a user conducts a search, the matching items having the strongest behavior-based associations with the submitted search query may be elevated to a more prominent position in the search results listing; in addition, one or more items that do not match the search query, but which have strong behavior-based associations with the search query, may be added to the search result listing. See, e.g., U.S. Pat. No. 6,185,558.
  • One problem with relying on behavior-based associations is that the quantity of behavioral data collected for a particular item may be insufficient to create behavior-based associations for that item. This may be the case when, for example, new items are added to an electronic catalog, or when new web pages or documents are added to a data repository. Unfortunately, the problem is self perpetuating because popular items (items with behavioral associations) typically remain popular due to their heightened exposure, while new and generally unknown items remain unpopular due to their lack of exposure. This problem is sometimes referred to as the “cold-start” problem.
  • One possible way to reduce the cold-start problem is to supplement the behavior-based associations with content-based associations between items. For example, a new item (one for which little or no behavioral data exists) can be associated with other items based on similarities between the attributes or other content of the items. These content-based associations may then be used to increase the new item's exposure in the same way behavior-based associations are used.
  • Unfortunately, content-based associations tend to be less reliable than behavior-based associations, especially if the item content is not highly consistent in format. In addition, content-based associations frequently are not a good predictor of the items users desire to purchase, view or otherwise select in combination, and thus tend to be less useful. As one example, suppose that an electronic catalog system displays lists of related products on product detail pages, with these lists generated automatically based on aggregate purchase histories. In such system, the detail page for a particular product (e.g., a printer) may desirably list products that are very different from, but complementary of, that product, such as commonly purchased accessories for the product (e.g., an ink cartridge for the printer). If content-based associations were used in place of the behavior-based associations, however, these complementary products likely would not appear since their attributes would typically be dissimilar to those of the featured product.
  • SUMMARY
  • The present invention comprises computer-implemented systems and methods for extrapolating behavior-based associations to “behavior-deficient” items (generally items for which the collected user activity data of a particular type is insufficient to create meaningful or reliable behavior-based associations). The behavior-based associations are extrapolated based on “substitutability” associations between the behavior-deficient items and other items. These substitutability associations may be based on the attributes or content of the items, in which case they are referred to as content-based associations. The items may, for example, be products represented in an electronic catalog, web pages or other documents accessible on a network, or web sites. More generally, the items can be any type of item for which user behaviors (e.g., purchases, accesses, downloads, etc.) can be monitored and analyzed to detect behavior-based associations, and for which suitable substitutability associations may be detected.
  • In one embodiment, the behavior-based associations that are extrapolated are associations between selectable items. For example, suppose that item A is behaviorally associated with items B and C because, for example, users who select A also frequently select B, and/or C. Suppose further that item A has a content-based association with item X (e.g., because many of the attributes of A and X are the same), and that item X is a behavior-deficient item (e.g., because it is new or unpopular). In accordance with the invention, item A's behavior-based associations with B and C may be extrapolated to, or “inherited by,” item X such that new associations are created between X and B and between X and C. Note that X may be dissimilar in content to both A and C in this example, such that no associations would be created between X and B and between X and C if the associations were based solely on item content.
  • The strengths of these newly created associations may be dependent upon both (a) the degree to which items A and X are similar in content, and (b) the strengths of the behavior-based associations between A and B and between A and C, respectively. The strengths of the new associations may also depend on whether X is similar in content to any other items that have a behavior-based association with B and/or C. The newly created associations may, but need not, be terminated or phased out as sufficient user activity data becomes available for creating behavior-based associations between X and other items.
  • In another embodiment, the behavior-based associations that are extrapolated to behavior-deficient items are associations between search queries and selectable items. These query-item associations are used to rank search results for display, and/or to supplement search results with additional items that do not match the search query. For example, suppose that search query Q is behaviorally associated with item A because, for example, users who submit Q frequently select item A from the search results listing. Suppose further that a new and thus behavior-deficient item, item B, is introduced into the search space, and that item B is similar in content to, and thus substitutable with, item A. In accordance with the invention, a new association may automatically be created between Q and item B. This new association may cause item B to be displayed at a more prominent position in the search results listing for Q, and if item B does not match Q, may cause item B to be added to the search result listing for Q.
  • The invention may also be used to extrapolate other types of associations to behavior-deficient items. For example, a strong behavior-based association may exist between a particular ad and a particular web page based on the relatively high click-through rate experienced when the ad is displayed on this page. When a new web page (potentially on an entirely different web site) becomes available for purposes of displaying ads, this new page may inherit the behavior-based association with the ad, causing the ad to be selected (or selected more frequently than otherwise) for display on the new page.
  • The invention also comprises a computer-implemented method of extrapolating item attributes. The method comprises: identifying a first item that has a first attribute, and a second item that is not known to have said first attribute; and determining a strength of a substitution association between the first and second items. The strength of the substitution association is based at least partly on an automated analysis of content of the first and second items. The method further comprises extrapolating the first attribute to the second item based on the strength of the substitution association.
  • Neither this summary nor the following detailed description purports to define the invention. The invention is defined by the claims.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 illustrates a web site system according to one embodiment of the invention.
  • FIG. 2 is a flow chart illustrating one embodiment of a process for creating new item associations using content-based and behavior-based associations between items.
  • FIG. 3A is a graph depicting behavior-based associations between four items in an electronic catalog.
  • FIG. 3B is a graph depicting example content-based associations between the items of FIG. 3A.
  • FIG. 3C illustrates how the behavioral and content-based associations of FIGS. 3A and 3B may be used in combination to create new associations between items.
  • FIG. 3D illustrates how the behavioral and content-based associations of FIGS. 3A and 3B may be used in combination to create new associations for a newly added pocketed red polo shirt.
  • FIG. 4 illustrates an embodiment in which the new associations are created between search queries and search results.
  • FIG. 5A is a graph depicting behavior-based associations between a search query and items (web pages) in a search space.
  • FIG. 5B is a graph depicting content-based associations between the items in FIG. 5A and three newly added items.
  • FIG. 5C illustrates how the behavioral and content-based associations of FIGS. 5A and 5B may be used in combination to create new associations between the search query and particular items.
  • DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS
  • Specific embodiments of the invention will now be described with reference to the drawings. These embodiments are intended to illustrate, and not limit, the present invention. The invention is defined by the claims.
  • I. Electronic Catalog Embodiment
  • FIG. 1 illustrates an embodiment in which the invention is employed for purposes of detecting associations between items represented in a browsable electronic catalog of items. The detected associations between items may be used for various purposes, such as to supplement item detail pages with lists of related items, and/or to generate personalized recommendations for particular users. See, e.g., U.S. Pat. No. 6,912,505, the disclosure of which is hereby incorporated by reference.
  • As is common, the electronic catalog in this embodiment contains item content supplied by many different entities. For example, some of the item content may be supplied by a variety of different marketplace sellers, as described in U.S. Pub. 2003/0200156 A1, the disclosure of which is hereby incorporated by reference. As a result, the catalog data lacks a sufficient degree of uniformity or consistency to reliably detect content-based associations between items. Consequently, behavior-based associations (those based on collected user activity or “behavioral” data, such as users' purchase histories, rental histories, detail page viewing histories, download histories, etc.) are generally more reliable than content-based associations. Behavior-based associations may be preferred over content-based associations for other reasons as well, depending on how the detected associations are used.
  • In this type of system, the quantity of behavioral data collected for a given item, may, in many cases, be insufficient to reliably detect behavior-based associations between that item and any other items. This may be the case where, for example, an item was only recently added to the electronic catalog, or is relatively unpopular. Rather than merely relying on content-based associations for such items, the present embodiment uses a combination of content mining and behavioral mining to create new associations for these items. This is accomplished by using content-based associations, or alternatively another type of “substitutability” association (i.e., an association that represents or is based on a degree to which particular items are substitutable with each other), to effectively extrapolate behavior-based associations to the new or unpopular items.
  • For example, suppose that a behavior-based association exists between items A and B, and that item C is a new item for which little or no behavioral data exists (i.e., it is a behavior-deficient item). Suppose further that items B and C are very similar in content, as determined, for example, by comparing their respective attributes (e.g., name, category, author, subject, description, manufacturer, price, etc.). In this scenario, the present embodiment effectively extrapolates or extends the B's association with A to item C, such that C effectively inherits a behavior-based association with A. (If B has behavior-based associations with other items, C may inherit those as well.) This new association between A and C may be referred to as an extrapolated or inherited association.
  • The strength of this new association between items A and C depends upon both the strength of the A-B behavioral-based association and the strength of the B-C content-based or other substitutability association. The strength of the A-C association also preferably depends on whether A and C are associated through any other “paths.” For instance, the association between A and C will be stronger if A also has a behavior-based association with D, and D has a content-based association with C. As behavioral data is collected over time for item C, the extrapolated relationships created between item C and other items is may, but need not, be phased out or terminated in favor of pure behavior-based associations. There is benefit to keep applying the process of extrapolating associations even when enough signal is present for pure behavioral relationships. Effectively, the extrapolated associations are generated by taking the “nearest-neighborhood” of substitutable items for any given item in aggregate. Common behavioral associations within the nearest neighborhood would be boosted due to this aggregated treatment. For example, there may be some telescopes that have accessories that have higher behavioral association weights than a star-guide map. However, the star-guide map may be common to all the telescopes, so the guide's weight would get boosted in the aggregate. This behavior has been empirically shown to help reduce the erroneous associations from noisy behavioral information.
  • As depicted in FIG. 1, a web server system 30 includes a web server 32 that generates and serves pages of a host web site to computing devices 34 of end users. Although depicted as desktop computers for purposes of illustration, the computing devices 34 may include a variety of other types of devices, such as cellular telephones and Personal Digital Assistants (PDAs). The web server 32 may be implemented as a single physical server or a collection of physical servers. The invention may alternatively be embodied in another type of multi-user interactive system, such as an interactive television system, an online services network, or a telephone-based system in which users select items to acquire via telephone keypad entries and/or voice.
  • The web server 32 provides user access to an electronic catalog of items represented within a database 36 or a collection of databases. The items represented in the database 36 may include or consist of items that may be purchased, rented, licensed, downloaded, or otherwise acquired via the web site (e.g., consumer electronics products; household appliances; book, music and video titles in physical and/or downloadable form; magazine subscriptions, computer programs, documents, etc.). The items may consist primarily or exclusively of physical products that are shipped to users, and/or of digital products that are delivered over a network. Many hundreds of millions of different items may be represented in the database 36. The catalog data stored for a given item in the database 36 typically includes a number of different attributes (e.g., name, manufacturer, author, category, subject, color, browse node, price, etc.), which may be represented as name-value pairs. Different catalog items may have different attributes. As is conventional, the items may be arranged within a hierarchy of browse categories to facilitate navigation of the catalog.
  • As will be recognized, the present invention is not limited to items that can be purchased or otherwise acquired from an electronic catalog. For example, the invention may also be employed to derive behavioral relationships between web sites, web pages, business represented in an online business directory, blogs, chat rooms, authors, brands, people (e.g., in the context of a social networking system), and documents stored on a company network. In general, the inventive methods described herein can be applied to any type (or types) of item for which both (a) the associated item attributes or content, or some other source of information, permits the detection of items that are highly substitutable, and (b) activity data of users, such as purchase histories, viewing histories, explicit ratings, etc., can be used to detect behavior-based associations.
  • As illustrated, the web server 32, which may include any number of physical servers, runs a page generator component 33 that dynamically generates web pages in response to requests from the user computing devices 34. The web pages are generated using a repository of web page templates 38, and using data retrieved from a set of services 35. The types of services 35 can vary widely, and may include, for example, a catalog service that returns catalog data for particular items, a search service that processes search queries submitted by users, a recommendation service that generates and returns personalized item recommendations for users, and a transaction processing services that processes purchases and/or other types of transactions.
  • In one embodiment, users of the web site can obtain detailed information about each item by accessing the item's detail page within the electronic catalog. Each item detail page may be located by, for example, conducting a search for the item via a search engine of the web site, or by selecting the item from a browse tree listing. Each item detail page may provide an option for the user to acquire the item from a retail entity and/or from another user of the system.
  • As illustrated in FIG. 1, the web server system 30 and/or the services 35 maintain item selection histories 40 for each user of the web site. The item selection history 40 of each user identifies catalog items selected by that user via the web site, preferably together with the associated dates and times of selection. Depending upon the nature and purpose of the web site (e.g., retail sales, user-to-user sales, movie rentals, customer reviews, music downloads, etc.), the item selection histories may, for example, include item purchase histories, item rental histories, item detail page viewing histories, item download histories, or any combination thereof. In some embodiments, the item selection histories 40 may include data obtained from external sources, such as the web site systems of business partners, browser toolbars of users, or customer credit card records. Item selection histories 40 of many hundreds of thousands or millions of unique users may be maintained and analyzed by the system 30. Each user account may be treated as a separate user for purposes of maintaining item selection histories; thus for example, if members of a household share a single account, they may be treated as a single user.
  • As further illustrated in FIG. 1, a behavior-based association mining component 44 collectively analyzes or “mines” the item selection histories of the users periodically (e.g., once per day) to detect and quantify behavior-based associations between particular catalog items. The methods described in U.S. Pat. No. 6,912,505, referenced above, may be used for this purpose. The behavior-based association mining component 44 generates a table 46 or other data structure that identifies pairs of items for which a behavior-based association has been detected. For each such pair of items, the table 46 also stores a behavioral association strength value or “weight” indicating the strength of the association. The associations may be based on any type or types of recorded user activity, such as purchases, rentals, viewing events, shopping cart adds, and/or downloads. In general, the strength of the association between two items depends on how many unique users who selected one item (for purchase, viewing, etc.) also selected the other. These counts are proportioned against the individual item selection counts. Using the proportions, significance tests or signal processing techniques may be performed to reduce the number of invalid associations due to noise in the data. Each entry in the table 46 may, for example, be in the form of a one-to-many mapping that maps a particular item to a list of the most closely related items, together with associated weights. Behavior-based associations that fall below a selected strength threshold may be excluded from the table 46.
  • Also illustrated in FIG. 1 is a content-based association mining component 42 that periodically and collectively mines the electronic database of items 36 to detect and quantify content-based associations between particular catalog items. The content-based association mining component 42 generates a content-based associations table 48 that identifies pairs of items that share similar characteristics or content. For each such pair, the table 48 also stores a respective content-based association strength value or weight representing the strength of the content-based association. Each such weight value also generally represents the degrees to which the corresponding items are substitutable or interchangeable with each other. Any of a variety of known methods for comparing item attributes may be used to detect and quantify the content-based associations. Techniques from natural language processing such as simple inter-document term frequency or more complicated algorithms such as latent semantic analysis may be used. Also, pattern recognition techniques such as neural networks or Bayesian belief networks operating over the content feature space may be used. Content-based associations that fall below a selected threshold (e.g., 80% similarity if the strengths are in a probabilistic domain) may be excluded from the table 48.
  • The content-based analysis may be limited to pairs of items in which one of the two items is a “behavior-deficient” item. For example, if item purchases are used to detect the behavior-based associations, an item may be treated as behavior deficient if it has been purchased less than ten times, or if the purchase behaviors of those who have purchased it are insufficiently reliable to associate it with any other item. An item may be behavior deficient if, for example, it has only recently been added to the electronic catalog, or if it is an obscure, high priced, or otherwise unpopular item.
  • The behavior-based and content-based associations tables 46, 48 are periodically analyzed in combination by an extrapolation component 50 to selectively extrapolate or propagate behavior-based associations to unpopular items, as described above. For example, if the behavior-based table 46 indicates that B is behaviorally associated with C, D and E, and the content-based table 48 indicates that unpopular item U has a content-based association with B, the extrapolation component 50 may create associations between U and C, U and D, and U and E. The extrapolation component 50 thereby effectively augments the behavior-based associations table 46 with these extrapolated associations, particularly for “behavior-deficient” items. This augmented behavioral association data table is depicted in FIG. 1 as element 52, although the augmented table may actually be created by simply adding new entries to the behavior-based associations table 46. The task of creating extrapolated associations may, for example, be triggered by the generation of a new behavior-based associations table 46.
  • As with the behavior-based association table 46, the augmented behavioral association data table 52 includes an association weight value for each pair of associated items. For extrapolated associations, the weights are generated based on the corresponding behavior-based and content-based weights, as described below. These extrapolated association weights are preferably normalized with (on the same scale as) the purely behavior-based weights. If the augmented table is simply created by adding new entries to the behavior-based association table, the extrapolated association weights must be in the same domain as the true behavioral-based weights
  • The augmented behavioral association data table 52 may be used for a variety of purposes. For example, when a user accesses an item detail page of an item, the web server 32 may access the augmented behavioral association data table 52 to look up a list of related items, and may incorporate this list into the item detail page. If the item detail page is for an unpopular item, this list or related items will ordinarily be based exclusively on extrapolated associations created for the unpopular item. The augmented behavioral association data table 52, including the association weights stored therein, may also be used to generate personalized recommendations that are based on the item selections of the target user. The methods described in U.S. Pat. No. 6,912,505, referenced above, may be used for this purpose. As yet another example, the augmented behavioral association data table 52 may be used to augment a search results set with one or more items that are closely related to those that match the user's search query.
  • In the embodiment of FIG. 1, the item associations recorded in the behavior-based table 46 and the augmented behavioral association data table 52 are preferably “directional” associations. For instance, although item A may be mapped to item B, item B is not necessary mapped to item A. Thus, for example, although item B may appear on item A's detail page (as a related item), item A may not appear on item B's detail page. In other embodiments, the associations may be non-directional.
  • FIG. 2 is a flow chart which illustrates one example of the steps that may be performed by the extrapolation component 50. As depicted by step 201, each item, i, in the catalog is analyzed in sequence. In step 202, the current item is checked to determine whether it is “popular,” which in the sample flow chart involves determining whether it has any behavior-based associations to any other items. (In the embodiment of FIG. 2, every item is treated as either popular or unpopular, with popularity being based on collected user activity data.) In other embodiments, any other standard may be used to classify whether or not an item is popular. For example, an item with less than a certain threshold number of behavioral associations may be considered unpopular.
  • Returning to the sample flow chart of FIG. 2, if the item i does have any such associations, i.e., if it is a popular item, then according to step 210 the process skips to the next item and returns to step 202. If however, item i is unpopular, then item i is selected for further analysis by obtaining each item s that has a content association value with item i greater than a threshold value. In other words, in step 203 an item s that has significant content-based similarity to item i is identified. This significance indicates the items' substitutability or interchangeability.
  • Next, in step 204, item s may be further analyzed to whether or not it is substitutable for i. This step allows for additional error-reducing mechanisms which may ensure that s is substitutable for i. For example, a red men's polo shirt may be highly content related (step 203) with a red women's polo shirt, but they may not be substitutable (step 204), since a man would not wear a woman's shirt, and vice versa. Step 204 thus reduces this possibility for error. As described in U.S. Pat. No. 6,912,505, one method that may be used to assess whether two items are substitutes for each other is to monitor how frequently they are selected for viewing within common browsing sessions. Item classifiers such as “men” and “women” may also be used to assess substitutability.
  • If s is not substitutable for i, then the process continues to search (step 209) until a substitutable item is found. Once a substitutable item s is found, then the process in step 205 searches to find each item b that has a behavior-based association with s by skipping over items without behavior-based associations (step 208). As an additional error-reducing mechanism, in step 206 each item b that does have a behavior-based association with s is checked to determine if that behavior-based association with s is valid. For example, a set of batteries may have a behavior-based association with an electronic device because those batteries are often purchased with the electronic device, but if those batteries are incompatible with the device (which may be determined, e.g., by examining return orders), then the behavior-based association between the batteries and the electronic device may be considered invalid.
  • Finally, after an item b that has a valid behavior-based association with s is found, where s is a validly substitutable item for i, then in step 207, the current behavior-based association weight from i to b is assigned the value of the old behavior-based association weight from i to b plus the product of the content-based association weight from i to s multiplied by the behavior-based association weight of s to b. Examples of this value assignment will be discussed below with reference to FIGS. 3A-3D. The use of multiplication to combine the i to s content-based and s to b behavior-based weights and the use of summation to combine this result with the any previous i to b behavior-based weight is only specific to this embodiment. Alternatives such as linear combination instead of multiplication or noisy-OR instead of summation may be used. After each item b that has a behavior-based association with each item s substitutable for each item i in the catalog is processed (steps 208-210), then the process ends. The system may thus allow for unpopular items to inherit behavioral association data from a single item as well as multiple items. In certain embodiments, a set limit can be placed on the number of relationships that are created during processing in order to address the substantially large amounts of relationships that can be created in web space.
  • The graphs shown in FIGS. 3A-3D will be used to illustrated an example scenario. In these graphs, each node represents a respective item in the electronic catalog, and each edge (shown as an arrow) represents an association between two items. The numbers included in-line with the arrows represent corresponding association weights or strengths, with behavioral association weights (including those created via extrapolation) being on a scale of zero to infinity, and content-based weights being on a scale of zero to one.
  • FIG. 3A illustrates behavioral associations between four items, while FIG. 3B illustrates the content-based associations between these same items. For purposes of illustration, it may be assumed that these four items are the only items in the electronic catalog. FIG. 3A has few connections since not every item pair in a catalog may have behavioral data associated with it. Since the graphs of FIGS. 3A and 3B represent different domains, behavioral association and content association, respectively, the edges of each graph represent different kinds of relationships.
  • The behavioral association graph, FIG. 3A, shows some measure of intentional relationships. In FIG. 3A, a green polo shirt 301 is shown to have a behavioral association strength of forty to cargo shorts, and of ten to a chronometer watch. This does not necessarily mean that the chronometer watch 303 has a behavioral association strength of ten to the green polo shirt 301, since the arrow points from the green polo shirt 301 to the chronometer watch 303, and not vice versa. In other words, while the users who have selected the green polo shirt 301 have also selected the chronometer watch 303 to create a significant relationship for the shirt, it does not mean that these users create enough significance with respect to the chronometer watch's total selection base. The red polo shirt 302 has no edges because little or no behavioral data exists for the red polo shirt 302.
  • The content association graph, FIG. 3B, shows some measure of how innately similar items are. Since the content-based relationship is inherent to the pair of items, the directionality of the relationship between the pair of items is mutual, represented by bi-directional arrows. The content association strength or weight between the green polo shirt 301 and red polo shirt 302 has a value of 0.9. A value of 1.0 would mean that two items are identical. Consequently, the association strength value of 0.9 between the green polo shirt and the red polo shirt means that the two items are highly similar in content, which is understandable, since the only difference between the two items is their color.
  • FIG. 3C illustrates how the behavioral and content-based associations of FIGS. 3A and 3B may be used in combination to create extrapolated associations for the unpopular red polo shirt. Although this example illustrates an item inheriting behavioral association values from a single item, the sample equation given also works for an item inheriting behavioral association values from multiple items. As shown in FIG. 3A, there is no behavior-based association edge from the red polo shirt 302 to the cargo shorts 304. Consequently, the behavior-based association weight from the red polo shirt 302 to the cargo shirts 304 is zero. Running through the sample process of FIG. 2 would provide behavioral associations for the behavior-deficient red polo shirt 302 inherited from the behavioral associations of the green polo shirt 301. According to the sample equation given in FIG. 2 (step 207), the value of the new behavior-based association weight from the red polo shirt 302 to the cargo shorts 304 would equal the value of the old behavior-based association weight from the red polo shirt 302 to the cargo shorts 304 (which is zero, since no edge exists) plus the product of the content-based association weight from the red polo shirt 302 to the green polo shirt 301 (0.9 from FIG. 3B) multiplied by the behavior-based association weight of the green polo shirt 301 to the cargo shorts 304 (which is 40 from FIG. 3A). In other words, the value of the new behavior-based association weight from the red polo shirt 302 to the cargo shorts 304 is 0+(0.9*40)=36 (FIG. 3C). Similarly, the red polo shirt 302 would be associated with the watch 303 at a strength of 9=0+(0.9*10).
  • With these newly inherited (extrapolated) behavioral associations, when the detail page for the red polo shirt 302 is accessed in the electronic catalog, this detail page will display the cargo shorts 304 and chronometer watch 303 as related items, even though neither has a pure behavior-based association with the red polo shirt. (The decision of whether to display these related items on this detail page may depend on whether the strengths of the newly created associations exceed some threshold, and also on whether other related items exists that have stronger associations with the red polo shirt.) Thus user will thus desirably be exposed to related items that are behaviorally related to (e.g., commonly purchased in combination with) the red polo shirt. In some embodiments, the newly created associations may also cause the red polo shirt to show up on the detail pages for the cargo shorts and/or the chronometer watch.
  • Continuing this example, assume that a red pocketed polo shirt 305 (FIG. 3D) is now added to the catalog, and does not yet have any behavioral data (e.g., it has not yet been purchased). Assume further that the red pocketed polo shirt 305 has a content-based association strength of 0.9 with the red polo shirt 302, and a content-based association strength value of 0.8 with the green polo shirt 301, as depicted by the dashed lines in FIG. 3D. If the process of FIG. 2 were applied, the result would be extrapolated associations between the new pocketed red polo shirt 305 with the watch 303 and the shorts 304, as shown in FIG. 3D. The strength of the newly created association would be (0.9×9)+(0.8×10)=16.1 for the watch 302 and (0.9×36)+(0.8×40)=64.4 for the shorts 304. As illustrated by this example, the fact that the red pocketed polo shirt 305 has similar attributes to multiple items that are behaviorally related to the watch 303 serves to increase the strength of the newly created association.
  • Once the system collects sufficient behavioral data for the red polo shirt 302 and the red pocketed polo shirt 305 (e.g., as the result of purchases of these items), the extrapolated associations may optionally be terminated or phased out in favor of behavior-based associations. If a phase-out process is used, the strengths of the extrapolated associations may, for example, be decreased in proportion to the amount of behavioral data collected for the associated items. The extrapolated associations may alternatively be phased out over time regardless of the quantity of behavioral data, such that extrapolated associations are only used for relatively new or newly added items (e.g., those added in the last five days).
  • II. Embodiments Using Other Measures of Substitutability
  • In the embodiment described above, the content-based association weights generally represent the degree to which particular items are substitutable which each other. This is because items that have similar attributes or content (e.g., two camcorders with similar specifications) tend to be substitutes for each other. Content-based associations are thus one form of substitutability association.
  • Although item content (i.e., the content of, or descriptive of, a particular item) provides an effective mechanism for automatically measuring substitutability, the substitutability associations may be detected automatically using other sources of information. For example, as described in U.S. Pat. No. 6,912,505, substitutability associations can be detected by mining the session-specific item viewing histories of users, and particularly their session-specific item detail page viewing histories. This is because users tend to comparison shop for a particular type of item when they browse the catalog. Thus, for example, if a relatively large number of users who select item A for viewing also select item B for viewing during the same browsing session, items A and B are likely highly substitutable. In contrast, purchase-based associations tend to reveal items that are complementary of each other. Because catalog items tend to be viewed much more frequently than they are purchased (especially for high priced items), viewing-history-based (substitutability) associations can often be detected for a particular item even though the item is behavior deficient in the purchase domain.
  • Thus, for example, suppose that a purchase-based behavioral association exists between items A and B. Suppose further that item C has not been purchased (and is thus behavior deficient), but co-occurs relatively frequently with item A in the session-specific item detail page viewing histories of users. In this scenario, the viewing-based (substitutability) association between A and C may be used to create a new (extrapolated) association between B and C.
  • In the context of FIG. 1, this variation can be implemented by replacing the content-based association mining component 42 with a component that analyzes the session-specific item detail page viewing histories of users. The table 48 would still store substitutability association information (including weight values), but the associations would no longer be based on item content. Alternatively, both content-based mining and item viewing history mining can be used in combination to detect the substitutability associations. The behavior-based association mining component 44 would use purchase histories of users to detect the behavior-based associations represented in table 46.
  • III. Search Embodiment
  • FIG. 4 illustrates an embodiment in which the invention is employed for purposes of creating new associations between search queries and new or otherwise behavior-deficient items. In this particular example, the items are web pages in a search space, where the search space may, for example, be a particular web site, the Internet, or a corporate intranet. As will be apparent, the items could alternatively be products represented in an electronic catalog, blogs, podcasts, business listings in an online directory, other types of documents, or any other type of item for which keyword searches can be performed. The associations between search queries and items in this embodiment may be used for various purposes, such as to rank items in a search result listing, and/or to supplement the search result listing with additional items.
  • In this type of system, the quantity of behavioral data collected for a given page, may, in many cases, be insufficient to reliably detect behavior-based associations between that page and particular search queries. This may be the case where, for example, the page is new, such that few users have had the opportunity to click through to it from a search results listing. As with the catalog embodiment described above, this lack of behavioral data tends to be self perpetuating, as the lack of such data may cause the page to be displayed in a less prominent position in, or to be completely omitted from, search result listings.
  • As depicted in FIG. 4, the search system 430 in this embodiment includes a search engine 432 which responds to search queries (typically consisting of textual search strings) received over the Internet from users' computing devices 434. The search engine 432 may be implemented as software running on a single physical server or a collection of physical servers. The search engine 432 provides searchable access to a collection of web pages in a search space 436, with each web page identified by a unique uniform resource locator (URL). The pages represented in the search space 436 may include or consist of pages hosted by a single source or a wide variety of different sources. As illustrated, the search engine 432 may use a pre-generated search index 435 to identify web pages that match particular search queries.
  • As illustrated in FIG. 1, the search system 430 maintains a search activity log 440 containing activity data (behavioral data) descriptive of search activities of users. The stored activity data includes the submitted search queries, and includes identifiers, such as URLs, of the web pages selected by particular users. In some embodiments, the search activity log 440 may include data obtained from external sources, such as the search systems of business partners. Search histories of many hundreds of thousands or millions of unique users may be maintained and analyzed by the system 430. The search activity data may, for example, be stored in a chronological log file, or in a database of the type described in U.S. Pat. Pub. 2005/0033803 A1, the disclosure of which is hereby incorporated by reference.
  • As further illustrated in FIG. 1, a behavioral association mining component 444 collectively analyzes or “mines” the search activity data 440 periodically (e.g., once per day) to detect and quantify behavior-based associations between search queries and particular web pages. Examples of data mining algorithms that may be used for this purpose are described in U.S. Pat. No. 6,185,558, U.S. Patent Pub. 2005/0222987 A1 and U.S. patent application Ser. No. 11/276,079, filed Feb. 13, 2006, the disclosures of which are hereby incorporated by reference.
  • The behavioral association mining component 444 generates a table 446 or other data structure that identifies detected behavior-based associations between particular search queries and web page URLs. (The search queries represented in this table 446 may consist solely of search terms and phrases, or may, in some embodiments, also include other types of constraints such as search-field designations.) For each such pair of items, the table 446 also stores a behavioral association strength value or “weight” indicating the strength of the association. The associations may be based on any type or types of recorded user activity, such as search submissions, selections from search results, and/or previewing a search result. In general, the strength of the association between the pair will depend on how many users who submitted the search query thereafter selected the URL. Each entry in the table 446 may, for example, be in the form of a one-to-many mapping that maps a particular query to a list of the most closely related web pages, together with associated weights. Behavior-based associations that fall below a selected strength threshold may be excluded from the table 446.
  • Also illustrated in FIG. 1 is a content-based association mining component 442 that periodically analyzes the web pages in the search space 436 to detect and quantify content-based associations between particular pages. The content-based association mining component 442 generates a URL-to-URL association table 448 that identifies pairs of web pages that share similar characteristics or content. For each such pair, the table 448 also stores a respective content-based association strength value or weight. These values generally represent the substitutability of particular pairs of web pages. The table 448 may be generated such that each URL pair consists of a URL of a behavior-deficient web page and a URL of a non-behavior-deficient web page. Content-based associations that fall below a selected threshold (e.g., 80% similarity) may be excluded from the table 448.
  • The behavior-based and content-based associations tables 446, 448 are periodically analyzed in combination by an extrapolation component 450 to selectively extrapolate or propagate behavior-based associations to new or otherwise behavior-deficient web pages, as described above. The extrapolation may be performed using substantially the same process shown in FIG. 2 and described above. For example, if the query-to-URL association table 446 indicates that search query Q is behaviorally associated with page P, and the URL-to-URL association table 448 indicates that behavior-deficient web page D has a content-based association with P, the extrapolation component 450 may create a new, extrapolated association between Q and D. For extrapolated associations added to the query-to-URL association table 446, the weights are generated based on the corresponding behavior-based and content-based weights, as described below. These extrapolated association weights may be normalized with (on the same general scale as) the purely behavior-based weights.
  • The augmented query-to-URL table 446 may be used to provide users with “behaviorally integrated” search results that depend on the actions of past users. For example, when a user submits a search query to the search engine 432, the search system 430 may access the query-to-URL association table 446 to rank the matching web pages for display. The search system may additionally or alternatively augment the list of matching web pages with additional web pages that do not “match” the search query, but which have actual or extrapolated behavioral associations with the search query.
  • FIG. 5A-5C illustrate a simple example of how new query-to-item associations may be formed between the search query “ziggy stardust” and newly added web pages. As depicted in FIG. 5A, the search space initially contains three web pages that are potentially related to this search query: a page 501 about the Ziggy Stardust phase of David Bowie's career, a page 503 about Ziggy Marley 503, and a page 505 about David Bowie but with no occurrences of either “Ziggy” or “Stardust.” A standard (non-behaviorally integrated) search for “Ziggy Stardust” would return the matching Ziggy Stardust page and possibly the partially matching Ziggy Marley page, but would not return the David Bowie page (despite its higher degree of relevance).
  • Assume further that behavior-based associations exist between the search query “ziggy stardust” and pages 501, 503 and 505 with strengths of twenty, one and ten, respectively, as depicted in FIG. 5A. (Note that the behavior-based association with the non-matching David Bowie page 505 may have been created by detecting that users who searched for “ziggy stardust” often eventually accessed the David Bowie page 505, even though this page did not shown up in the search results.) With these associations, a behaviorally integrated search query for “ziggy stardust” will desirably return all three pages, 501, 503 and 505. These pages may be displayed in the search results listing from highest to lowest behavior-based strength.
  • With reference to FIG. 5B, assume that three new pages are now added to the search space, a new Ziggy Stardust page 502, a new David Bowie page 504, and a new David Bowie Discography page 506 which includes a timeline mentioning Bowie's Ziggy Stardust phase. Because these pages are new, very little or no behavioral data is associated with them (i.e., they are behavior deficient), at least for the search query “ziggy stardust.” As a result, the new Ziggy Stardust page 502 will likely appear at or near the bottom of the behaviorally integrated search results, and the David Bowie Discography page 506 might also appear at the bottom depending on how well the search engine's text-based parsing performs. The new David Bowie page 504 will not appear in the search results, even though it is relevant to the search query. Further, due to the lack of exposure of these new pages in search results, they will likely rarely be selected, and thus will remain behaviorally deficient. Consequently, these new pages suffer from the cold-start problem. The extrapolation methods described herein alleviate this problem, allowing these new pages 502-506 to be immediately displayed in relatively prominent positions in the search results.
  • FIG. 5B illustrates an example set of content-based associations between these newly added web pages 502-506 and the pre-existing pages. The weight values are again on a scale of 0 to 1, with a weight of 1 representing the highest detectable level of content similarity (and thus substitutability). The absence of an arrow between two pages means that they are not sufficiently similar in content to detect or form a content-based association, or equivalently, that they have a content-based association of zero. As shown, the new Ziggy Stardust page 502 has a content-based association value of 0.8 with the preexisting Ziggy Stardust page 501, 0.2 with the David Bowie Discography page 506, and 0 for the rest of the pages. Similarly, the new David Bowie page 504 has a content-based association value of 0.8 with the preexisting David Bowie page 505 and the David Bowie Discography page 506, and 0 for all other pages. Finally, the David Bowie Discography page 506 has a content-based association with the new David Bowie and Ziggy Stardust pages as previously described and has associations with the preexisting David Bowie and Ziggy Stardust pages with values of 0.8 and 0.2, respectively. The content-based associations between the web pages may be detected and quantified using well known text and linguistic analysis algorithms.
  • FIG. 5C illustrates the set of behavior-based associations (including inherited or “extrapolated” associations) that exist after the new web pages 502 and 504 inherit the behavior-based associations of their respective content- similar neighbors 501 and 505. According to the example equation given in FIG. 2 (step 207), the strength of the new association between the search query and the new Ziggy Stardust page 502 is 0.8×20=16. Similarly, the strength of the new association between the search query and the new David Bowie page 504 is 0.8×10=8. Finally, the strength of the extrapolated association between the search query and the new David Bowie Discography page 506 is (0.8×10)+(0.2×20)=12. Now, when a search is conducted for “ziggy stardust,” the newly added pages 502, 504 and 506 will desirably be displayed in more prominent positions in the search results than the Ziggy Marley page 503 since they have stronger (inherited) behavior-based associates with this query.
  • As will be recognized, the search-based embodiment shown in FIG. 4 may be used in the context of a catalog search engine to assist users in locating items, such as products, in an electronic catalog. In such embodiments, the content-based association mining component 442 of FIG. 4 may be replaced by, or used in combination with, a component that assesses item substitutability by analyzing session-specific item viewing histories, as described above.
  • The search-based embodiment described above can be extended to include general user input instead of just search queries. In this regard, a search query is just one type of user input that can be associated with particular items. Other forms of user input include keywords, tags, captions, and discussion items.
  • The catalog-based and search-based embodiments described above can also be combined in various ways such that both item-to-item and query-to-item associations are extrapolated to behavior-deficient items.
  • IV. Extrapolations of Other Types of Associations
  • As will be apparent, the inventive methods described herein can also be used to extrapolate other types of behavior-based associations to behavior-deficient items. For example, in the context of online advertising systems that select ads to display on web pages (typically across a number of participating ad publishing sites), the system may detect behavior-based associations between particular ads and particular web pages. These associations may be based on ad click-through rates (e.g., ad A is associated with page P because a relatively large number of those who have viewed page P with ad A have clicked on ad A), and may be used by the ad server system to dynamically select ads for display. In such a system, when a new web page becomes available for purposes of displaying ads, the page may initially be matched to one or more other web pages (potentially of other web sites) based on content similarities. The new (behavior-deficient) web page may then inherit the ad-to-page associations of these content-similar web pages, increasing the likelihood that particular ads will be selected for display on the new page. As a variation of this embodiment, behavior-based associations between particular ads and particular web sites may be extrapolated to new web sites.
  • As another example, U.S. application Ser. No. 10/766,368, filed Jan. 28, 2004, the disclosure of which is hereby incorporated by reference herein, discloses methods for detecting behavior-based associations between particular catalog items (e.g., products available for purchase) and particular web sites. When a new web site becomes available, the disclosed extrapolation methods may be used to create new associations between particular catalog items and the new web site. These newly created associations may, for example, be used to select catalog items to recommend to users who visit the new web site, and/or to suggest web sites to users who view or purchase particular products.
  • V. Conclusion
  • All of the methods and processes described above may be embodied in, and fully automated via, software code modules executed by one or more general purpose computers. The code modules may be stored in any type of computer-readable medium or other computer storage device. Some or all of the methods may alternatively be embodied in specialized computer hardware. The behavioral data and association tables may be stored in any type of computer data repository, such as relational databases and flat files systems that use magnetic disk storage and/or solid state RAM.
  • Although this invention has been described in terms of certain embodiments and applications, other embodiments and applications that are apparent to those of ordinary skill in the art, including embodiments which do not provide all of the features and advantages set forth herein, are also within the scope of this invention. The scope of the present invention is intended to be defined only by reference to the following claims.

Claims (13)

1. A computer system for detecting associations between items, the computer system comprising:
an item data repository comprising a physical computer storage device, the item data repository configured to store item data representative of a plurality of items, the plurality of items comprising first items having an attribute and second items not known to have the attribute;
a content-based association mining component operative to detect substitution associations between the first and second items, at least in part, by analyzing content of the first and second items; and
an extrapolation component that comprises computer hardware, the extrapolation component operative to extrapolate the attribute from at least some of the first items to the second items based at least partly on a strength of the substitution associations between the first and second items.
2. The computer system of claim 1, wherein the extrapolation component is further configured to extrapolate the attribute by propagating the attribute through a directed graph, the directed graph comprising nodes, each node representing a selected one of the plurality of items.
3. The computer system of claim 1, wherein the extrapolation component is further configured to propagate the attribute through the directed graph based at least partly on strength of associations between the plurality of items.
4. The computer system of claim 1, wherein the attribute is a behavioral association.
5. The computer system of claim 1, further comprising a recommendations module operative to generate item recommendations for a user based at least in part on the attribute.
6. The computer system of claim 5, wherein the recommendations module is further operative to use the attribute to improve recommendations for behavior-deficient items.
7. The computer system of claim 5, wherein the content-based association mining component comprises computer hardware.
8. The computer system of claim 5, wherein the items are products represented in an electronic catalog, and the content-based association mining component uses catalog content of said products to detect said substitution associations.
9. A computer-implemented method, comprising:
identifying a first item that has a first attribute, and a second item that is not known to have said first attribute;
determining a strength of a substitution association between the first and second items based, said strength determined based on least partly on an automated analysis of content of the first and second items; and
extrapolating the first attribute to the second item based on said strength of said substitution association;
said method performed in its entirety by one or more computers.
10. The method of claim 9, wherein extrapolating the first attribute comprises propagating the first attribute according to a directed graph that comprises a plurality of nodes, each node representing an item.
11. The method of claim 9, wherein the first attribute is a behavioral association with a third item.
12. The method of claim 9, further comprising treating the second item as having said first attribute for purposes of automatically determining whether to recommend the second to a user.
13. The method of claim 9, wherein the first and second items are products represented in an electronic catalog, and the method comprises determining said strength based on an automated analysis of catalog content of the first and second items.
US12/835,125 2006-06-16 2010-07-13 Extrapolation of item attributes based on detected associations between the items Abandoned US20100299360A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/835,125 US20100299360A1 (en) 2006-06-16 2010-07-13 Extrapolation of item attributes based on detected associations between the items

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/424,730 US8032425B2 (en) 2006-06-16 2006-06-16 Extrapolation of behavior-based associations to behavior-deficient items
US12/835,125 US20100299360A1 (en) 2006-06-16 2010-07-13 Extrapolation of item attributes based on detected associations between the items

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/424,730 Division US8032425B2 (en) 2006-06-16 2006-06-16 Extrapolation of behavior-based associations to behavior-deficient items

Publications (1)

Publication Number Publication Date
US20100299360A1 true US20100299360A1 (en) 2010-11-25

Family

ID=38877872

Family Applications (3)

Application Number Title Priority Date Filing Date
US11/424,730 Expired - Fee Related US8032425B2 (en) 2006-06-16 2006-06-16 Extrapolation of behavior-based associations to behavior-deficient items
US12/835,125 Abandoned US20100299360A1 (en) 2006-06-16 2010-07-13 Extrapolation of item attributes based on detected associations between the items
US13/092,439 Active US8090625B2 (en) 2006-06-16 2011-04-22 Extrapolation-based creation of associations between search queries and items

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US11/424,730 Expired - Fee Related US8032425B2 (en) 2006-06-16 2006-06-16 Extrapolation of behavior-based associations to behavior-deficient items

Family Applications After (1)

Application Number Title Priority Date Filing Date
US13/092,439 Active US8090625B2 (en) 2006-06-16 2011-04-22 Extrapolation-based creation of associations between search queries and items

Country Status (1)

Country Link
US (3) US8032425B2 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090182642A1 (en) * 2008-01-14 2009-07-16 Neelakantan Sundaresan Methods and systems to recommend an item
US20120330756A1 (en) * 2011-06-24 2012-12-27 At & T Intellectual Property I, Lp Method and apparatus for targeted advertising
US8386079B1 (en) 2011-10-28 2013-02-26 Google Inc. Systems and methods for determining semantic information associated with objects
US8719347B1 (en) 2010-12-18 2014-05-06 Google Inc. Scoring stream items with models based on user interests
US9706258B2 (en) 2008-02-26 2017-07-11 At&T Intellectual Property I, L.P. System and method for promoting marketable items
US9720983B1 (en) * 2014-07-07 2017-08-01 Google Inc. Extracting mobile application keywords
US10349147B2 (en) 2013-10-23 2019-07-09 At&T Intellectual Property I, L.P. Method and apparatus for promotional programming
US10423968B2 (en) 2011-06-30 2019-09-24 At&T Intellectual Property I, L.P. Method and apparatus for marketability assessment
US11068960B2 (en) * 2019-05-29 2021-07-20 Walmart Apollo, Llc Methods and apparatus for item substitution

Families Citing this family (86)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7668821B1 (en) 2005-11-17 2010-02-23 Amazon Technologies, Inc. Recommendations based on item tagging activities of users
US7835998B2 (en) 2006-03-06 2010-11-16 Veveo, Inc. Methods and systems for selecting and presenting content on a first system based on user preferences learned on a second system
US8316394B2 (en) 2006-03-24 2012-11-20 United Video Properties, Inc. Interactive media guidance application with intelligent navigation and display features
US20080097821A1 (en) * 2006-10-24 2008-04-24 Microsoft Corporation Recommendations utilizing meta-data based pair-wise lift predictions
KR100859216B1 (en) * 2007-04-05 2008-09-18 주식회사 제이포애드 System for providing advertisemnts and method thereof
US8751507B2 (en) * 2007-06-29 2014-06-10 Amazon Technologies, Inc. Recommendation system with multiple integrated recommenders
US8086620B2 (en) 2007-09-12 2011-12-27 Ebay Inc. Inference of query relationships
US8001003B1 (en) 2007-09-28 2011-08-16 Amazon Technologies, Inc. Methods and systems for searching for and identifying data repository deficits
US9513699B2 (en) * 2007-10-24 2016-12-06 Invention Science Fund I, LL Method of selecting a second content based on a user's reaction to a first content
US20090112849A1 (en) * 2007-10-24 2009-04-30 Searete Llc Selecting a second content based on a user's reaction to a first content of at least two instances of displayed content
US20090112694A1 (en) * 2007-10-24 2009-04-30 Searete Llc, A Limited Liability Corporation Of The State Of Delaware Targeted-advertising based on a sensed physiological response by a person to a general advertisement
US8112407B2 (en) * 2007-10-24 2012-02-07 The Invention Science Fund I, Llc Selecting a second content based on a user's reaction to a first content
US8126867B2 (en) * 2007-10-24 2012-02-28 The Invention Science Fund I, Llc Returning a second content based on a user's reaction to a first content
US20090112696A1 (en) * 2007-10-24 2009-04-30 Jung Edward K Y Method of space-available advertising in a mobile device
US9582805B2 (en) * 2007-10-24 2017-02-28 Invention Science Fund I, Llc Returning a personalized advertisement
US20090112693A1 (en) * 2007-10-24 2009-04-30 Jung Edward K Y Providing personalized advertising
US8001108B2 (en) * 2007-10-24 2011-08-16 The Invention Science Fund I, Llc Returning a new content based on a person's reaction to at least two instances of previously displayed content
US8234262B2 (en) * 2007-10-24 2012-07-31 The Invention Science Fund I, Llc Method of selecting a second content based on a user's reaction to a first content of at least two instances of displayed content
US20090112697A1 (en) * 2007-10-30 2009-04-30 Searete Llc, A Limited Liability Corporation Of The State Of Delaware Providing personalized advertising
US20090171754A1 (en) * 2007-12-28 2009-07-02 Kane Francis J Widget-assisted detection and exposure of cross-site behavioral associations
US20090172021A1 (en) * 2007-12-28 2009-07-02 Kane Francis J Recommendations based on actions performed on multiple remote servers
US8271878B2 (en) * 2007-12-28 2012-09-18 Amazon Technologies, Inc. Behavior-based selection of items to present on affiliate sites
US20090171968A1 (en) * 2007-12-28 2009-07-02 Kane Francis J Widget-assisted content personalization based on user behaviors tracked across multiple web sites
US20090171755A1 (en) * 2007-12-28 2009-07-02 Kane Francis J Behavior-based generation of site-to-site referrals
US7822742B2 (en) * 2008-01-02 2010-10-26 Microsoft Corporation Modifying relevance ranking of search result items
JP4896227B2 (en) * 2008-03-21 2012-03-14 株式会社電通 Advertisement medium determining apparatus and advertisement medium determining method
US8250454B2 (en) * 2008-04-03 2012-08-21 Microsoft Corporation Client-side composing/weighting of ads
US20090319940A1 (en) * 2008-06-20 2009-12-24 Microsoft Corporation Network of trust as married to multi-scale
US8682736B2 (en) * 2008-06-24 2014-03-25 Microsoft Corporation Collection represents combined intent
US8380583B1 (en) * 2008-12-23 2013-02-19 Amazon Technologies, Inc. System for extrapolating item characteristics
US20190026812A9 (en) * 2009-04-20 2019-01-24 4-Tell, Inc Further Improvements in Recommendation Systems
US8438052B1 (en) 2009-04-20 2013-05-07 Amazon Technologies, Inc. Automated selection of three of more items to recommend as a bundle
US10269021B2 (en) 2009-04-20 2019-04-23 4-Tell, Inc. More improvements in recommendation systems
US20100268661A1 (en) * 2009-04-20 2010-10-21 4-Tell, Inc Recommendation Systems
US9378519B1 (en) 2009-07-28 2016-06-28 Amazon Technologies, Inc. Collaborative electronic commerce
US8463769B1 (en) 2009-09-16 2013-06-11 Amazon Technologies, Inc. Identifying missing search phrases
US8495068B1 (en) 2009-10-21 2013-07-23 Amazon Technologies, Inc. Dynamic classifier for tax and tariff calculations
US8290818B1 (en) 2009-11-19 2012-10-16 Amazon Technologies, Inc. System for recommending item bundles
US8285602B1 (en) 2009-11-19 2012-10-09 Amazon Technologies, Inc. System for recommending item bundles
US8332395B2 (en) * 2010-02-25 2012-12-11 International Business Machines Corporation Graphically searching and displaying data
US8650172B2 (en) * 2010-03-01 2014-02-11 Microsoft Corporation Searchable web site discovery and recommendation
US8639686B1 (en) 2010-09-07 2014-01-28 Amazon Technologies, Inc. Item identification systems and methods
US8818880B1 (en) 2010-09-07 2014-08-26 Amazon Technologies, Inc. Systems and methods for source identification in item sourcing
US8447747B1 (en) 2010-09-14 2013-05-21 Amazon Technologies, Inc. System for generating behavior-based associations for multiple domain-specific applications
US8577754B1 (en) 2010-11-19 2013-11-05 Amazon Technologies, Inc. Identifying low utility item-to-item association mappings
US9471681B1 (en) * 2011-01-06 2016-10-18 A9.Com, Inc. Techniques for search optimization
US9736524B2 (en) 2011-01-06 2017-08-15 Veveo, Inc. Methods of and systems for content search based on environment sampling
US8977640B2 (en) * 2011-02-28 2015-03-10 Yahoo! Inc. System for processing complex queries
US8751487B2 (en) * 2011-02-28 2014-06-10 International Business Machines Corporation Generating a semantic graph relating information assets using feedback re-enforced search and navigation
US9646110B2 (en) * 2011-02-28 2017-05-09 International Business Machines Corporation Managing information assets using feedback re-enforced search and navigation
US8468164B1 (en) * 2011-03-09 2013-06-18 Amazon Technologies, Inc. Personalized recommendations based on related users
US10467677B2 (en) 2011-09-28 2019-11-05 Nara Logics, Inc. Systems and methods for providing recommendations based on collaborative and/or content-based nodal interrelationships
US11727249B2 (en) 2011-09-28 2023-08-15 Nara Logics, Inc. Methods for constructing and applying synaptic networks
US8732101B1 (en) 2013-03-15 2014-05-20 Nara Logics, Inc. Apparatus and method for providing harmonized recommendations based on an integrated user profile
US10789526B2 (en) 2012-03-09 2020-09-29 Nara Logics, Inc. Method, system, and non-transitory computer-readable medium for constructing and applying synaptic networks
US11151617B2 (en) 2012-03-09 2021-10-19 Nara Logics, Inc. Systems and methods for providing recommendations based on collaborative and/or content-based nodal interrelationships
US8170971B1 (en) 2011-09-28 2012-05-01 Ava, Inc. Systems and methods for providing recommendations based on collaborative and/or content-based nodal interrelationships
US20130085858A1 (en) * 2011-10-04 2013-04-04 Richard Bill Sim Targeting advertisements based on user interactions
US8484099B1 (en) 2012-02-06 2013-07-09 Amazon Technologies, Inc. Method, medium, and system for behavior-based recommendations of product upgrades
US9082143B1 (en) 2012-08-24 2015-07-14 Amazon Technologies, Inc. Merchant attribution for sales
US20140172704A1 (en) * 2012-12-13 2014-06-19 Firat S. Atagun Shared Pools for Common Transactions
US9361653B2 (en) * 2013-01-16 2016-06-07 Sap Se Social recommendations for business process platform
WO2015010086A2 (en) * 2013-07-19 2015-01-22 eyeQ Insights System for monitoring and analyzing behavior and uses thereof
US20150088700A1 (en) * 2013-09-20 2015-03-26 Ebay Inc. Recommendations for selling past purchases
US9959563B1 (en) 2013-12-19 2018-05-01 Amazon Technologies, Inc. Recommendation generation for infrequently accessed items
US10242351B1 (en) * 2014-05-07 2019-03-26 Square, Inc. Digital wallet for groups
US10026083B1 (en) 2014-05-11 2018-07-17 Square, Inc. Tab for a venue
US10108950B2 (en) * 2014-08-12 2018-10-23 Capital One Services, Llc System and method for providing a group account
US10510071B2 (en) * 2014-09-29 2019-12-17 The Toronto-Dominion Bank Systems and methods for generating and administering mobile applications using pre-loaded tokens
US10565601B2 (en) * 2015-02-27 2020-02-18 The Nielsen Company (Us), Llc Methods and apparatus to identify non-traditional asset-bundles for purchasing groups using social media
US10073892B1 (en) 2015-06-12 2018-09-11 Amazon Technologies, Inc. Item attribute based data mining system
US10803507B1 (en) * 2015-11-23 2020-10-13 Amazon Technologies, Inc. System for generating output comparing attributes of items
US10083521B1 (en) * 2015-12-04 2018-09-25 A9.Com, Inc. Content recommendation based on color match
US10699318B2 (en) 2016-01-29 2020-06-30 Walmart Apollo, Llc Systems and methods for item discoverability
US10055465B2 (en) 2016-09-09 2018-08-21 Facebook, Inc. Delivering a continuous feed of content items to a client device
US10204166B2 (en) 2016-09-09 2019-02-12 Facebook, Inc. Ranking content items based on session information
CN108009180B (en) * 2016-10-28 2021-09-21 哈尔滨工业大学深圳研究生院 High-quality mode item set mining method and device and data processing equipment
US10650432B1 (en) 2016-11-28 2020-05-12 Amazon Technologies, Inc. Recommendation system using improved neural network
US10089661B1 (en) * 2016-12-15 2018-10-02 Amazon Technologies, Inc. Identifying software products to test
CN107025281A (en) * 2017-03-31 2017-08-08 上海斐讯数据通信技术有限公司 A kind of file management method of Intelligent worn device, module and system
US20180315059A1 (en) * 2017-04-28 2018-11-01 Target Brands, Inc. Method and system of managing item assortment based on demand transfer
CN107547642A (en) * 2017-08-28 2018-01-05 深圳市盛路物联通讯技术有限公司 A kind of equipment recommendation method and controller based on Internet of Things
US11263222B2 (en) * 2017-10-25 2022-03-01 Walmart Apollo, Llc System for calculating competitive interrelationships in item-pairs
US10762157B2 (en) 2018-02-09 2020-09-01 Quantcast Corporation Balancing on-side engagement
US11171908B1 (en) * 2018-02-28 2021-11-09 Snap Inc. Ranking content for display
CN110619082B (en) * 2019-09-20 2022-05-17 苏州市职业大学 Project recommendation method based on repeated search mechanism

Citations (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5583763A (en) * 1993-09-09 1996-12-10 Mni Interactive Method and apparatus for recommending selections based on preferences in a multi-user system
US5909023A (en) * 1996-02-23 1999-06-01 Hitachi, Ltd. Online shopping support method and system for sales promotions based on the purchase history of users
US6185558B1 (en) * 1998-03-03 2001-02-06 Amazon.Com, Inc. Identifying the items most relevant to a current query based on items selected in connection with similar queries
US6321221B1 (en) * 1998-07-17 2001-11-20 Net Perceptions, Inc. System, method and article of manufacture for increasing the user value of recommendations
US20020019763A1 (en) * 1998-09-18 2002-02-14 Linden Gregory D. Use of product viewing histories of users to identify related products
US6356879B2 (en) * 1998-10-09 2002-03-12 International Business Machines Corporation Content based method for product-peer filtering
US20020052873A1 (en) * 2000-07-21 2002-05-02 Joaquin Delgado System and method for obtaining user preferences and providing user recommendations for unseen physical and information goods and services
US6438579B1 (en) * 1999-07-16 2002-08-20 Agent Arts, Inc. Automated content and collaboration-based system and methods for determining and providing content recommendations
US6460036B1 (en) * 1994-11-29 2002-10-01 Pinpoint Incorporated System and method for providing customized electronic newspapers and target advertisements
US20020198882A1 (en) * 2001-03-29 2002-12-26 Linden Gregory D. Content personalization based on actions performed during a current browsing session
US20030023499A1 (en) * 2001-07-25 2003-01-30 International Business Machines Corporation Apparatus, system and method for automatically making operational purchasing decisions
US20030023538A1 (en) * 2001-07-25 2003-01-30 International Business Machines Corporation Apparatus, system and method for automatically making operational selling decisions
US20030130975A1 (en) * 2000-01-27 2003-07-10 Carole Muller Decision-support system for system performance management
US6691163B1 (en) * 1999-12-23 2004-02-10 Alexa Internet Use of web usage trail data to identify related links
US6782370B1 (en) * 1997-09-04 2004-08-24 Cendant Publishing, Inc. System and method for providing recommendation of goods or services based on recorded purchasing history
US20040254911A1 (en) * 2000-12-22 2004-12-16 Xerox Corporation Recommender system and method
US20050189415A1 (en) * 2004-02-27 2005-09-01 Fano Andrew E. System for individualized customer interaction
US20050222987A1 (en) * 2004-04-02 2005-10-06 Vadon Eric R Automated detection of associations between search criteria and item categories based on collective analysis of user activity data
US20060041548A1 (en) * 2004-07-23 2006-02-23 Jeffrey Parsons System and method for estimating user ratings from user behavior and providing recommendations
US20060167757A1 (en) * 2005-01-21 2006-07-27 Holden Jeffrey A Method and system for automated comparison of items
US7092936B1 (en) * 2001-08-22 2006-08-15 Oracle International Corporation System and method for search and recommendation based on usage mining
US7152061B2 (en) * 2003-12-08 2006-12-19 Iac Search & Media, Inc. Methods and systems for providing a response to a query
US20070078849A1 (en) * 2005-08-19 2007-04-05 Slothouber Louis P System and method for recommending items of interest to a user
US20070118498A1 (en) * 2005-11-22 2007-05-24 Nec Laboratories America, Inc. Methods and systems for utilizing content, dynamic patterns, and/or relational information for data analysis
US7720720B1 (en) * 2004-08-05 2010-05-18 Versata Development Group, Inc. System and method for generating effective recommendations
US7912755B2 (en) * 2005-09-23 2011-03-22 Pronto, Inc. Method and system for identifying product-related information on a web page

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7953740B1 (en) * 2006-02-13 2011-05-31 Amazon Technologies, Inc. Detection of behavior-based associations between search strings and items

Patent Citations (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5583763A (en) * 1993-09-09 1996-12-10 Mni Interactive Method and apparatus for recommending selections based on preferences in a multi-user system
US6460036B1 (en) * 1994-11-29 2002-10-01 Pinpoint Incorporated System and method for providing customized electronic newspapers and target advertisements
US5909023A (en) * 1996-02-23 1999-06-01 Hitachi, Ltd. Online shopping support method and system for sales promotions based on the purchase history of users
US6782370B1 (en) * 1997-09-04 2004-08-24 Cendant Publishing, Inc. System and method for providing recommendation of goods or services based on recorded purchasing history
US6185558B1 (en) * 1998-03-03 2001-02-06 Amazon.Com, Inc. Identifying the items most relevant to a current query based on items selected in connection with similar queries
US6321221B1 (en) * 1998-07-17 2001-11-20 Net Perceptions, Inc. System, method and article of manufacture for increasing the user value of recommendations
US6912505B2 (en) * 1998-09-18 2005-06-28 Amazon.Com, Inc. Use of product viewing histories of users to identify related products
US20020019763A1 (en) * 1998-09-18 2002-02-14 Linden Gregory D. Use of product viewing histories of users to identify related products
US6356879B2 (en) * 1998-10-09 2002-03-12 International Business Machines Corporation Content based method for product-peer filtering
US6438579B1 (en) * 1999-07-16 2002-08-20 Agent Arts, Inc. Automated content and collaboration-based system and methods for determining and providing content recommendations
US6691163B1 (en) * 1999-12-23 2004-02-10 Alexa Internet Use of web usage trail data to identify related links
US20030130975A1 (en) * 2000-01-27 2003-07-10 Carole Muller Decision-support system for system performance management
US20020052873A1 (en) * 2000-07-21 2002-05-02 Joaquin Delgado System and method for obtaining user preferences and providing user recommendations for unseen physical and information goods and services
US20040254911A1 (en) * 2000-12-22 2004-12-16 Xerox Corporation Recommender system and method
US7440943B2 (en) * 2000-12-22 2008-10-21 Xerox Corporation Recommender system and method
US7386547B2 (en) * 2000-12-22 2008-06-10 Xerox Corporation Recommender system and method
US20020198882A1 (en) * 2001-03-29 2002-12-26 Linden Gregory D. Content personalization based on actions performed during a current browsing session
US20030023538A1 (en) * 2001-07-25 2003-01-30 International Business Machines Corporation Apparatus, system and method for automatically making operational selling decisions
US20030023499A1 (en) * 2001-07-25 2003-01-30 International Business Machines Corporation Apparatus, system and method for automatically making operational purchasing decisions
US7092936B1 (en) * 2001-08-22 2006-08-15 Oracle International Corporation System and method for search and recommendation based on usage mining
US7152061B2 (en) * 2003-12-08 2006-12-19 Iac Search & Media, Inc. Methods and systems for providing a response to a query
US20050189415A1 (en) * 2004-02-27 2005-09-01 Fano Andrew E. System for individualized customer interaction
US20050222987A1 (en) * 2004-04-02 2005-10-06 Vadon Eric R Automated detection of associations between search criteria and item categories based on collective analysis of user activity data
US20060041548A1 (en) * 2004-07-23 2006-02-23 Jeffrey Parsons System and method for estimating user ratings from user behavior and providing recommendations
US7756879B2 (en) * 2004-07-23 2010-07-13 Jeffrey Parsons System and method for estimating user ratings from user behavior and providing recommendations
US7720720B1 (en) * 2004-08-05 2010-05-18 Versata Development Group, Inc. System and method for generating effective recommendations
US20060212362A1 (en) * 2005-01-21 2006-09-21 Donsbach Aaron M Method and system for producing item comparisons
US20060167757A1 (en) * 2005-01-21 2006-07-27 Holden Jeffrey A Method and system for automated comparison of items
US7752077B2 (en) * 2005-01-21 2010-07-06 Amazon Technologies, Inc. Method and system for automated comparison of items
US20070078849A1 (en) * 2005-08-19 2007-04-05 Slothouber Louis P System and method for recommending items of interest to a user
US7912755B2 (en) * 2005-09-23 2011-03-22 Pronto, Inc. Method and system for identifying product-related information on a web page
US20070118498A1 (en) * 2005-11-22 2007-05-24 Nec Laboratories America, Inc. Methods and systems for utilizing content, dynamic patterns, and/or relational information for data analysis
US7853485B2 (en) * 2005-11-22 2010-12-14 Nec Laboratories America, Inc. Methods and systems for utilizing content, dynamic patterns, and/or relational information for data analysis

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090182642A1 (en) * 2008-01-14 2009-07-16 Neelakantan Sundaresan Methods and systems to recommend an item
US10587926B2 (en) 2008-02-26 2020-03-10 At&T Intellectual Property I, L.P. System and method for promoting marketable items
US9706258B2 (en) 2008-02-26 2017-07-11 At&T Intellectual Property I, L.P. System and method for promoting marketable items
US8990352B1 (en) 2010-12-18 2015-03-24 Google Inc. Stream of content for a channel
US9858275B1 (en) 2010-12-18 2018-01-02 Google Llc Scoring stream items in real time
US8984098B1 (en) 2010-12-18 2015-03-17 Google Inc. Organizing a stream of content
US8719347B1 (en) 2010-12-18 2014-05-06 Google Inc. Scoring stream items with models based on user interests
US9158775B1 (en) 2010-12-18 2015-10-13 Google Inc. Scoring stream items in real time
US9165305B1 (en) * 2010-12-18 2015-10-20 Google Inc. Generating models based on user behavior
US8732240B1 (en) 2010-12-18 2014-05-20 Google Inc. Scoring stream items with models based on user interests
US9712588B1 (en) 2010-12-18 2017-07-18 Google Inc. Generating a stream of content for a channel
US9979777B1 (en) 2010-12-18 2018-05-22 Google Llc Scoring stream items with models based on user interests
US9723044B1 (en) 2010-12-18 2017-08-01 Google Inc. Stream of content for a channel
US10832282B2 (en) * 2011-06-24 2020-11-10 At&T Intellectual Property I, L.P. Method and apparatus for targeted advertising
US10108980B2 (en) * 2011-06-24 2018-10-23 At&T Intellectual Property I, L.P. Method and apparatus for targeted advertising
US20120330756A1 (en) * 2011-06-24 2012-12-27 At & T Intellectual Property I, Lp Method and apparatus for targeted advertising
US10423968B2 (en) 2011-06-30 2019-09-24 At&T Intellectual Property I, L.P. Method and apparatus for marketability assessment
US11195186B2 (en) 2011-06-30 2021-12-07 At&T Intellectual Property I, L.P. Method and apparatus for marketability assessment
US8386079B1 (en) 2011-10-28 2013-02-26 Google Inc. Systems and methods for determining semantic information associated with objects
US10349147B2 (en) 2013-10-23 2019-07-09 At&T Intellectual Property I, L.P. Method and apparatus for promotional programming
US10951955B2 (en) 2013-10-23 2021-03-16 At&T Intellectual Property I, L.P. Method and apparatus for promotional programming
US9720983B1 (en) * 2014-07-07 2017-08-01 Google Inc. Extracting mobile application keywords
US11068960B2 (en) * 2019-05-29 2021-07-20 Walmart Apollo, Llc Methods and apparatus for item substitution
US11640635B2 (en) 2019-05-29 2023-05-02 Walmart Apollo, Llc Methods and apparatus for item substitution

Also Published As

Publication number Publication date
US20080004989A1 (en) 2008-01-03
US20110196895A1 (en) 2011-08-11
US8032425B2 (en) 2011-10-04
US8090625B2 (en) 2012-01-03

Similar Documents

Publication Publication Date Title
US8032425B2 (en) Extrapolation of behavior-based associations to behavior-deficient items
US9792332B2 (en) Mining of user event data to identify users with common interests
US7921071B2 (en) Processes for improving the utility of personalized recommendations generated by a recommendation engine
US8301623B2 (en) Probabilistic recommendation system
US11036795B2 (en) System and method for associating keywords with a web page
US8577880B1 (en) Recommendations based on item tagging activities of users
US8285602B1 (en) System for recommending item bundles
US8290818B1 (en) System for recommending item bundles
US8090621B1 (en) Method and system for associating feedback with recommendation rules
US8543584B2 (en) Detection of behavior-based associations between search strings and items
US7827186B2 (en) Duplicate item detection system and method
US8239287B1 (en) System for detecting probabilistic associations between items
US20090228353A1 (en) Query classification based on query click logs
US20080004884A1 (en) Employment of offline behavior to display online content
US20080005313A1 (en) Using offline activity to enhance online searching
US8751333B1 (en) System for extrapolating item characteristics
US20070185884A1 (en) Aggregating and presenting information on the web
Ashkan et al. Location-and Query-Aware Modeling of Browsing and Click Behavior in Sponsored Search
Henshaw et al. Restructuring ontologies through knowledge discovery
Lin et al. Incremental revision of recommendation rules for information services
Kywe IMPLEMENTATION OF WEB STORED SYSTEM BY BUI LDING A WEB PAGE RECOMMENDER SYSTEM

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION