WO2008113425A1 - A feed aggregation system - Google Patents

A feed aggregation system Download PDF

Info

Publication number
WO2008113425A1
WO2008113425A1 PCT/EP2008/000761 EP2008000761W WO2008113425A1 WO 2008113425 A1 WO2008113425 A1 WO 2008113425A1 EP 2008000761 W EP2008000761 W EP 2008000761W WO 2008113425 A1 WO2008113425 A1 WO 2008113425A1
Authority
WO
WIPO (PCT)
Prior art keywords
feed
story
topic
tagged
items
Prior art date
Application number
PCT/EP2008/000761
Other languages
French (fr)
Inventor
Peter Elger
Paul Watson
Barry Downes
Original Assignee
Waterford Institute Of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Waterford Institute Of Technology filed Critical Waterford Institute Of Technology
Publication of WO2008113425A1 publication Critical patent/WO2008113425A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Definitions

  • the present invention relates to a system and method for aggregating a web feed.
  • a web feed (sometimes called a "stream” or a "channel”) is a data format used for providing updated content to a user on a network.
  • a website can create a web feed of their content, and post a link to the web feed on the website. Visitors to the website can then subscribe to the web feed using a feed reader, which checks the source of the web feed regularly to see if new content is available. If so, the feed reader will download the new content for display to the user.
  • the use of web feeds allows a user to automatically receive the most up-to-date content on the network, without having to actively seek out the content by visiting the websites relevant to that content.
  • An item of a web feed generally comprises a title, a description of the content of the item, and a link to the source HTML page.
  • the web feed itself is normally specified as a document in XML format having links to the individual items.
  • Two examples of web feed formats are RSS and Atom.
  • Feed readers can be in the form of a dedicated desktop application, or can be a dedicated web page having links to the different items of the feed.
  • a user While it is possible for users to create a custom channel or feed that is comprised of the content from a plurality of selected feeds, it is not possible at present for a user to create a custom feed for a particular topic. Users of feeds must subscribe to distinct feeds and receive all or none of the items published to that feed by the feed originator. There is no mechanism to filter a feed on the basis of the topic of the feed item, e.g. a feed dedicated to "Cats and Dogs" could have feed items covering cats and separate feed items covering dogs, but a subscriber only interested in dogs has to receive and review both to determine their relevance to the topic of interest.
  • a method of providing a customised, user-moderated syndication feed for a topic comprising the steps of: (a) receiving a feed item;
  • the resultant syndication feed has the advantage of being compiled through computational analysis and user moderation, ensuring that the items contained in the feed are up-to-date regarding and relevant to the particular topic that the syndication feed relates to, and avoids the information overload that can sometimes occur with prior art systems.
  • said feed items are one or more of RSS or Atom feed items.
  • the step of analysing the feed item for associations to previously-received feed items comprises analysing the feed item for any embedded links to previously-received feed items.
  • the step of analysing the feed item for associations to previously-received feed items comprises analysing the feed item for any textual links to previously-received feed items.
  • the topic is user-created.
  • the step of tagging a story comprises adding topic metadata to said story.
  • the step of presenting the tagged story to raters of said topic comprises publishing the tagged story as part of a website.
  • the raters of said topic comprise at least one user of said website.
  • the step of presenting the tagged story to raters of said topic comprises the step of sending the tagged story across a network to an interactive client application operated by a respective rater.
  • the method comprises the step of ordering tagged stories in the created syndication feed based on the approval ratings of each tagged story.
  • the method comprises the step of ordering tagged stories in the created syndication feed based on the chronological order of receipt of feed items in said story.
  • the method comprises the step of ordering tagged stories in the created syndication feed based on the number of links to the feed items from other feed items.
  • the method comprises the step of ordering feed items in a tagged story in the created syndication feed based on the chronological order of receipt of feed items in said story.
  • the method comprises the step of ordering a feed item in a tagged story in the created syndication feed based on the number of links to the feed item from other feed items.
  • said step of receiving an approval rating for a tagged story comprises: receiving an indication from a rater of the relevance of one or more of said tagged story or a feed item in a tagged story to said topic.
  • Fig. 1 shows a system according to a preferred embodiment of the invention as an endpoint on a Service-Orientated Architecture (SOA) bus;
  • SOA Service-Orientated Architecture
  • Fig. 2 is an overview of the structure of the system of Fig. 1;
  • Fig. 3 is a flow diagram illustrating the operation of the system of Fig. 2;
  • Fig. 4 illustrates an example of the link analysis of the preferred embodiment;
  • Fig. 5 illustrates an example of a network of feed items referring to each other;
  • Fig. 6 illustrates an example of the topic group analysis of the preferred embodiment
  • Fig. 7 shows a sample topic group webpage
  • Fig. 8 illustrates an example of an implementation of the invention allowing the system of Fig. 2 to interact with a variety of clients.
  • Fig. 1 shows a Service-Orientated Architecture (SOA) configuration 10 coupled with a metafeed aggregator 12 according to preferred embodiment of the present invention.
  • SOA Service-Orientated Architecture
  • the metafeed aggregator 12 is coupled with a memory storage apparatus 14 for the storage of user profiles via the SOA 10.
  • the metafeed aggregator 12 is also coupled with a core feed aggregator 16, and the entire SOA 10 is linked to an outside network, for example the Internet 18, via web delivery and configuration apparatus 20.
  • the core aggregator 16 further comprises a web crawler 24 and feed reaper 26.
  • the core aggregator 16 is operable to regularly download feeds from the Internet 18, parse the feeds, and store them in an internal database 22.
  • An example of a currently available core aggregator 16 is the FeedHenry system (www.feedhenry.com).
  • the metafeed aggregator 12 is operable to receive notifications of the availability of new feed items from the core aggregator 16.
  • the metafeed aggregator 12 comprises a processor and memory, a text analysis module 28, a link analysis module 30, and a story distributor module 32.
  • the metafeed aggregator 12 further comprises a website publishing mechanism (not shown), operable to provide topic group webpages 34.
  • the system of the metafeed aggregator 12 operates as follows.
  • a system and method for providing a customised, user-moderated syndication feed for a topic is shown.
  • the core aggregator 16 via the SOA 10 bus crawls the Internet 18 to find new web feeds available for subscription.
  • the core aggregator 16 subscribes to the web feed, and will regularly download new feed items from the web feed and store them in internal database 22.
  • the FeedHenry system is operable to periodically crawl the Internet searching for newly-published syndication feeds. Once the crawler finds a new feed, the reaper module subscribes to and periodically downloads the new feed, storing the feed content locally for future reference.
  • a feed item generally comprises a title, a description of the content of the feed item, and a link to the source HTML page.
  • the core aggregator 16 When a new feed item is downloaded (Item A), the core aggregator 16 is operable to provide (step 100) the feed item to the metafeed aggregator 12.
  • the link analysis module 30 of the metafeed aggregator 12 then scans (step 110) the title and the description of the feed item, as well as the source HTML page for any links to previously received feed items.
  • FIG. 4 an abridged version of sample source code of an XML-based feed item 40 with source HTML page 42 is shown.
  • the feed item 40 has an embedded HTML link to a further HTML page 44 in the description section of the feed item 40.
  • the scanning step 110 comprises analysing the source code of the feed item 40 itself as well as the source HTML page 42 for links to any external HTML pages.
  • the link analysis module 30 checks if the external link is to a previously received feed item, or to the source HTML page for a previously received feed item. If a link exists, the link analysis module 30 is operable to collect feed items that are linked together into a group. A group of linked feed items is called a story.
  • the feed items becomes the first instance for a particular story - essentially, the "Breaking News" item for that story.
  • the feed item 40 comprises a link to external HTML page 44
  • the source HTML page 42 comprises a link to a further external HTML page (indicated by arrow 46).
  • the external HTML pages 44 and 46 that are compared with the database of stored feed items.
  • the HTML pages linked to by the feed item are included in the story, up to a specified depth.
  • the HTML pages are only included to a depth of one link from the feed item, e.g. even though external HTML page 44 contains a further link to an external HTML page, this link is not analysed.
  • the link analysis 110 can rely on examining the HTML links present in the source code of feed items.
  • the body of text of the feed items may be analysed for textual links to previously received feed items.
  • content of the feed items themselves may be extracted for analysis, e.g. the contents of navigation bars, advertisements, sidebars, etc. that are present in HTML pages may be subject to a textual analysis to determine if there is any similarity with previously received feed items.
  • an initial feed item 52 without any links to previously received feed items is analysed by the link analysis module 30.
  • This may be in the form of, for example, a "Breaking News" item from a news website, being the first instance of a particular story.
  • a second iteration of feed items 54 are then received by the metafeed aggregator 12, each having a direct link to the initial feed item 52.
  • a third iteration of feed items 56 are then received, feed items 56 not having a direct link to the initial feed item 52, but linking to feed items from the second iteration 54.
  • a sub-group of feed items 58 do not link towards the initial feed item 52, but may be somehow related, due to a link existing from one of the feed items from the third iteration 56.
  • the structure of the story 50 is stored within the memory of the metafeed aggregator 12. As further feed items that link to the story are received by the metafeed aggregator 12, links to the new feed items are added to the memory entry for the story 50.
  • the structure of the story 50 may be ordered according to certain conditions.
  • Basic ordering may be achieved through alphabetical ordering, or through the time of receipt of the individual feed items.
  • the main ordering mechanism is preferably performed by more complicated graphing of the story 50.
  • a story may be composed of a number of related items (related by links or textual references, determined by the link analysis module 30).
  • the metafeed aggregator system is operable to order the story based on two factors - the time flow of feed items, and the popularity of a feed item. Each feed item can be analysed to measure how many of the other feed items in the story link to that particular feed item, with the eventual ranking of the feed item within the story based on a combination of the measured factors.
  • the initial feed item i.e. the feed item that "broke" the story, for example a CNN report on a new medical breakthrough.
  • Other web feeds would follow in their reporting, e.g. say the BBC, MSNBC, Blog A, Blog B, Blog C, and finally the New York Times, each providing a feed item that is linked (whether by links or textual references) that is somehow linked into a story created by the receipt of the initial feed item.
  • Each of the feed items has an individual timestamp.
  • An ordered graph can be constructed based on the received timestamp, which would result in a story having the CNN feed item first and the New York Times feed item at the end.
  • a more qualitative feed ordering system is preferably utilised in conjunction with the timestamp values. This can be based on the popularity of the received feed items.
  • While the CNN feed item is ranked first due to its timestamp value, say that the three Blog feed items comprise links to the MSNBC story, while not linking to the CNN item.
  • a greater weight is given to the MSNBC item, and correspondingly it can be pushed higher up the rankings for the story.
  • the MSNBC item may be promoted above the BBC item and even the CNN item, even though it has a later timestamp. Use of this mechanism allows for a greater weight to be given to items that may have more substance or analysis, as opposed to a ranking based purely on the speed of publication.
  • the metafeed aggregator 12 is operable to perform a topic analysis (step 120) using the text analysis module 28.
  • a topic is a user-created customised search filter that defines what is to be searched for in the analysed story groups.
  • a topic comprises a title (e.g. "Cats"), and a list of keywords and/or search rules (e.g. cat; feline; kitten) associated with the subject matter of the topic.
  • the text analysis module 28 examines 120 the contents of a story group (i.e. the feed items) for any reference to the keywords of the topic. Standard textual analysis techniques may be employed to determine if a particular story is relevant to a topic. For example, if the number of mentions of a topic keyword in a story group exceeds a pre-defined threshold number, then the story as a whole is regarded as being potentially relevant to that topic.
  • the story is tagged as such. This is accomplished by amending the stored story within the memory of the metafeed aggregator 12 to include metadata referencing the particular topic.
  • each story group is then presented to a topic group.
  • a topic group is a selected group of moderators for a particular topic that determines the actual relevance of each story to the topic.
  • the story distributor module 32 is operable to examine (step 130) the topic metadata present for each individual story group stored in the memory of the metafeed aggregator 12. Based on the topics listed in the metadata, the story distributor module 32 sends the story in question to the relevant topic groups for further analysis.
  • the stories are distributed to the different topic groups, they are then presented (step 140) to the topic groups for approval.
  • the stories are presented in the form of a webpage 34 generated by the metafeed aggregator 12.
  • Each topic group has a specific webpage dedicated to that particular topic.
  • the members 72 of the topic group 70 can view the stories 74 that the computational analysis of the metafeed aggregator 12 has decided are potentially relevant to that topic.
  • the members 72 of the topic group 70 can then vote for or against the story 74 based on the actual relevance of that story to the topic (as interpreted by the individual topic group member).
  • the webpage 80 comprises a header title section 82, indicating for example the origin of the feed aggregation system i.e. the supplier of the service, banner advertising, and/or navigation information; a topic title 84 indicating the current topic; a plurality of story sections 86, showing the stories currently being voted on; and a number of extra options for a user of the webpage 80.
  • the extra options may comprise choices 88 to alter the arrangement of the current topic being displayed (e.g. options to order the stories by a ranking of most popular stories, most recent stories, or stories that may have had most activity without being most popular - i.e. most divisive stories); options to structure 90 the stories displayed by source (e.g. display stories from CNN, BBC News, RTE News, etc.); options to subscribe 92 to the web feed output for the various topics; or links 94 to webpages for other topics.
  • sources e.g. display stories from CNN, BBC News, RTE News, etc.
  • options to subscribe 92 to the web feed output for the various topics
  • voting options 98a-c, 96 are displayed for the individual stories 86. These story voting options 98a-c, 96 allow a user of the webpage 80 to vote for or against an individual story by allowing the users to vote for or against the individual feed items 98a- c, 96 within each story. It will be understood that, while in the example of Figure 7 individual feed items are voted on by the users of the webpage, alternatively or in addition individual stories can also be rated to allow quicker aggregated ratings of stories. The resultant rating for the story can thus be a function of the ratings received for the story itself, and/or the ratings received for the individual feed items contained in the story. So for example, if it were found that the rating for an item within an otherwise highly rated story was bringing down the story's relevance to a topic by more than a threshold amount, the item even could be removed from the story.
  • individually rating items within a story allows items not alone to be ranked and displayed by other criteria such as described above, but also according to their relative ratings within a story.
  • feed items 98a-c, 96 can be presented with additional information regarding the title, source, date, and possibly, extracts from the body of the item.
  • the template of Fig. 7 allows the users of the topic group webpage 80 to efficiently and easily promote the feed items and stories that they believe are of most relevance to the particular topic, and also to down vote any stories or feed items that are deemed not relevant, or of lesser relevance.
  • the voting schemes in the topic groups can be a simple "Yes/No" selection, or perhaps a sliding scale rating, e.g. 1-10.
  • the members of the topic group can be chosen to be a limited number of selected moderators, the moderators being the only people able to access the relevant topic group website 34, or the topic group website 34 may be available to the public, allowing any Internet user to vote on the relevance of the presented stories.
  • the step 140 of providing the stories to the topic group members may be accomplished not alone by providing the topic group members with access to the website 34 but by exposing an API (Application Programming Interface) 66 for the metafeed system 12 to other platforms.
  • API Application Programming Interface
  • a sample system is shown wherein the metafeed aggregator system 12 is operable to receive user ratings for stories via client applications running on users desktops 60, mobile clients 62 (e.g. PDAs, cellphones), or from applications running on third party websites 64, as well as from the topic group website 34.
  • IPTV systems, email systems, or text-to-speech systems may also utilise the system 12 via the API 66.
  • the desktop application 60 receives the selected stories from the metafeed aggregator 12 via the Internet, the application 60 being operable to present a voting interface to its users which then communicates to the metafeed system 12 via the API 66.
  • the API can be such that it enables various levels of security to be specified so allowing users of platforms 60, 62, 64 customised levels of access to the system 12.
  • the ratings that each story receives are recorded by the metafeed aggregator 12, and the results used to determine which stories are most relevant to particular topics.
  • An output topic web feed 38 is then produced for that particular topic, the contents of the web feed 38 being determined by the ratings received by each story, and whether those ratings exceed a topic threshold or not.
  • the topic threshold is a largely self-setting, floating value, whereby stories with ratings exceeding this threshold are deemed appropriate for the generated feed.
  • a fixed threshold could end up flooding an output topic web feed 38 or it could lead to a drought of feed items.
  • certain topics can be very active (e.g. a topic dedicated to general sport news), and the threshold would go up to maintain a usable flow-rate of stories, other topics are less active (e.g. a topic dedicated to a particular football team) and the threshold will drop accordingly to maintain the flow-rate.
  • This threshold level can also be adjusted by administrators to ensure that low activity topic groups don't maintain flow-rate of feed items at the expense of quality.
  • the ordering of feed items within an output topic feed 38 depends on a number of factors, including the relevance of the particular story, the timestamp of its particular feed items, the results of the link analysis and topic analysis stages, etc.

Abstract

A feed aggregation system is disclosed, the feed aggregation system operable to provide a customised, user-moderated syndication feed for a particular topic. This is done through the creation of ordered stories comprising feed items related to a topic. When a new feed item is received, the feed items is both analysed computationally and rated by a collaborative network of users. A syndication feed is generated based on the ratings and the result of the analysis, the items of the syndication feed being judged to be relevant to the particular topic.

Description

A Feed Aggregation System
The present invention relates to a system and method for aggregating a web feed.
Background to the Invention
A web feed (sometimes called a "stream" or a "channel") is a data format used for providing updated content to a user on a network. A website can create a web feed of their content, and post a link to the web feed on the website. Visitors to the website can then subscribe to the web feed using a feed reader, which checks the source of the web feed regularly to see if new content is available. If so, the feed reader will download the new content for display to the user. The use of web feeds allows a user to automatically receive the most up-to-date content on the network, without having to actively seek out the content by visiting the websites relevant to that content.
An item of a web feed generally comprises a title, a description of the content of the item, and a link to the source HTML page. The web feed itself is normally specified as a document in XML format having links to the individual items. Two examples of web feed formats are RSS and Atom.
A web feed is accessed using a feed reader. Feed readers (sometimes referred to as feed aggregators) can be in the form of a dedicated desktop application, or can be a dedicated web page having links to the different items of the feed.
Unfortunately, a user can often become overwhelmed with the sheer amount of information that is provided through normal web feeds. Users can find it difficult to find a particular web feed that provides them information on topics that are of interest to the user. Also, there is only very limited screening of the items present in a feed, meaning that the relevance of the items to the web feed is seldom taken into account.
One of the growing trends in modern Internet usage is that of social software, or social computing, and in particular collaborative software. This is where, aside from normal computational methods implemented in standard software packages, user actions and feedback are used to refine the eventual output, leading to a more accurate representation of content that is more applicable to the eventual users of the software.
Efforts have been made to implement social computing techniques in feed readers, for example Rojo™ from Rojo Networks, Inc. (www.roio.com*), which implements a simple voting scheme for stories, and Threz (www.threz.com"), which allows users to see what articles their friends are reading, and to create custom channels for the user and their friends to subscribe to, having selected feeds incorporated into the custom channel. Other examples of prior art systems are Techmeme (www.techmeme.com) and Megite (www.megite.com*).
Also, while it is possible for users to create a custom channel or feed that is comprised of the content from a plurality of selected feeds, it is not possible at present for a user to create a custom feed for a particular topic. Users of feeds must subscribe to distinct feeds and receive all or none of the items published to that feed by the feed originator. There is no mechanism to filter a feed on the basis of the topic of the feed item, e.g. a feed dedicated to "Cats and Dogs" could have feed items covering cats and separate feed items covering dogs, but a subscriber only interested in dogs has to receive and review both to determine their relevance to the topic of interest.
To cover a particular topic well a user must find and subscribe to many feeds on that particular topic. This could either be a handful of feeds or many hundreds, depending on the topic. The user must then monitor each of these subscriptions, reading each published item to judge the relevance and importance. Also, a user must remain somewhat active in finding new feeds on a given topic, as they become available for subscription. There is no mechanism that automatically subscribes a user to new feeds on a topic.
There is currently no feed reader available that easily and efficiently implements user customisation and social collaboration, with the effect of producing a customised, user- moderated syndication feed for a particular topic.
It is an object of this invention to provide a system that can overcome these problems. Summary of the Invention
Accordingly, there is provided a method of providing a customised, user-moderated syndication feed for a topic, the method comprising the steps of: (a) receiving a feed item;
(b) analysing the feed item for associations to previously-received feed items;
(c) grouping associated feed items into a story;
(d) analysing the story for specified keywords relevant to one or more of a number of topics, each topic comprising a pre-defined set of keywords; (e) tagging a story containing specified topic keywords as a story provisionally relevant to said topic;
(f) presenting the tagged story to raters of said topic;
(g) receiving an approval rating for a tagged story from a rater indicating the perceived relevance of said tagged story to said topic; and (h) creating a syndication feed for said topic, the syndication feed comprising a plurality of tagged stories based on the approval ratings of each tagged story.
When this method is employed, the resultant syndication feed has the advantage of being compiled through computational analysis and user moderation, ensuring that the items contained in the feed are up-to-date regarding and relevant to the particular topic that the syndication feed relates to, and avoids the information overload that can sometimes occur with prior art systems.
Preferably, said feed items are one or more of RSS or Atom feed items.
Preferably, the step of analysing the feed item for associations to previously-received feed items comprises analysing the feed item for any embedded links to previously-received feed items.
Preferably, the step of analysing the feed item for associations to previously-received feed items comprises analysing the feed item for any textual links to previously-received feed items. Preferably, the topic is user-created.
Preferably, the step of tagging a story comprises adding topic metadata to said story.
Preferably, the step of presenting the tagged story to raters of said topic comprises publishing the tagged story as part of a website.
Preferably, the raters of said topic comprise at least one user of said website.
Alternatively, the step of presenting the tagged story to raters of said topic comprises the step of sending the tagged story across a network to an interactive client application operated by a respective rater.
Preferably, the method comprises the step of ordering tagged stories in the created syndication feed based on the approval ratings of each tagged story.
Preferably, the method comprises the step of ordering tagged stories in the created syndication feed based on the chronological order of receipt of feed items in said story.
Preferably, the method comprises the step of ordering tagged stories in the created syndication feed based on the number of links to the feed items from other feed items.
Preferably, the method comprises the step of ordering feed items in a tagged story in the created syndication feed based on the chronological order of receipt of feed items in said story.
Preferably, the method comprises the step of ordering a feed item in a tagged story in the created syndication feed based on the number of links to the feed item from other feed items.
Preferably, said step of receiving an approval rating for a tagged story comprises: receiving an indication from a rater of the relevance of one or more of said tagged story or a feed item in a tagged story to said topic. In further aspects of the invention there is provided a corresponding system for providing a customised, user-moderated syndication feed for a topic and a computer program product executable to perform the invention.
Detailed Description of the Invention
An embodiment of the invention will now be described, by way of example, with reference to the following drawings, in which:
Fig. 1 shows a system according to a preferred embodiment of the invention as an endpoint on a Service-Orientated Architecture (SOA) bus;
Fig. 2 is an overview of the structure of the system of Fig. 1; Fig. 3 is a flow diagram illustrating the operation of the system of Fig. 2; Fig. 4 illustrates an example of the link analysis of the preferred embodiment; Fig. 5 illustrates an example of a network of feed items referring to each other;
Fig. 6 illustrates an example of the topic group analysis of the preferred embodiment; Fig. 7 shows a sample topic group webpage; and
Fig. 8 illustrates an example of an implementation of the invention allowing the system of Fig. 2 to interact with a variety of clients.
Fig. 1 shows a Service-Orientated Architecture (SOA) configuration 10 coupled with a metafeed aggregator 12 according to preferred embodiment of the present invention.
The metafeed aggregator 12 is coupled with a memory storage apparatus 14 for the storage of user profiles via the SOA 10. The metafeed aggregator 12 is also coupled with a core feed aggregator 16, and the entire SOA 10 is linked to an outside network, for example the Internet 18, via web delivery and configuration apparatus 20.
The core aggregator 16 further comprises a web crawler 24 and feed reaper 26. The core aggregator 16 is operable to regularly download feeds from the Internet 18, parse the feeds, and store them in an internal database 22. An example of a currently available core aggregator 16 is the FeedHenry system (www.feedhenry.com). The metafeed aggregator 12 is operable to receive notifications of the availability of new feed items from the core aggregator 16.
The metafeed aggregator 12 comprises a processor and memory, a text analysis module 28, a link analysis module 30, and a story distributor module 32. The metafeed aggregator 12 further comprises a website publishing mechanism (not shown), operable to provide topic group webpages 34. The system of the metafeed aggregator 12 operates as follows.
Referring to Figs. 2 and 3, a system and method for providing a customised, user-moderated syndication feed for a topic is shown. The core aggregator 16 via the SOA 10 bus crawls the Internet 18 to find new web feeds available for subscription. When a new web feed is found, the core aggregator 16 subscribes to the web feed, and will regularly download new feed items from the web feed and store them in internal database 22. For example, the FeedHenry system is operable to periodically crawl the Internet searching for newly-published syndication feeds. Once the crawler finds a new feed, the reaper module subscribes to and periodically downloads the new feed, storing the feed content locally for future reference.
As mentioned above, a feed item generally comprises a title, a description of the content of the feed item, and a link to the source HTML page.
When a new feed item is downloaded (Item A), the core aggregator 16 is operable to provide (step 100) the feed item to the metafeed aggregator 12. The link analysis module 30 of the metafeed aggregator 12 then scans (step 110) the title and the description of the feed item, as well as the source HTML page for any links to previously received feed items.
With reference to Fig. 4, an abridged version of sample source code of an XML-based feed item 40 with source HTML page 42 is shown. The feed item 40 has an embedded HTML link to a further HTML page 44 in the description section of the feed item 40. The scanning step 110 comprises analysing the source code of the feed item 40 itself as well as the source HTML page 42 for links to any external HTML pages. When a further link is found, the link analysis module 30 checks if the external link is to a previously received feed item, or to the source HTML page for a previously received feed item. If a link exists, the link analysis module 30 is operable to collect feed items that are linked together into a group. A group of linked feed items is called a story.
If no links or other associations to previously received feed items are found, then the feed items becomes the first instance for a particular story - essentially, the "Breaking News" item for that story.
In the example shown in Fig. 4, the feed item 40 comprises a link to external HTML page 44, and the source HTML page 42 comprises a link to a further external HTML page (indicated by arrow 46). In this case, it is the external HTML pages 44 and 46 that are compared with the database of stored feed items.
The HTML pages linked to by the feed item are included in the story, up to a specified depth. In the example of Fig. 4, the HTML pages are only included to a depth of one link from the feed item, e.g. even though external HTML page 44 contains a further link to an external HTML page, this link is not analysed.
As shown in Fig. 4, the link analysis 110 can rely on examining the HTML links present in the source code of feed items. However, it will also be understood that the body of text of the feed items may be analysed for textual links to previously received feed items. Also, content of the feed items themselves may be extracted for analysis, e.g. the contents of navigation bars, advertisements, sidebars, etc. that are present in HTML pages may be subject to a textual analysis to determine if there is any similarity with previously received feed items.
With reference to Fig. 5, a structure of a sample story 50 for a group of feed items
52,54,56,58 is shown. Firstly, an initial feed item 52 without any links to previously received feed items is analysed by the link analysis module 30. This may be in the form of, for example, a "Breaking News" item from a news website, being the first instance of a particular story. A second iteration of feed items 54 are then received by the metafeed aggregator 12, each having a direct link to the initial feed item 52. A third iteration of feed items 56 are then received, feed items 56 not having a direct link to the initial feed item 52, but linking to feed items from the second iteration 54. Finally, a sub-group of feed items 58 do not link towards the initial feed item 52, but may be somehow related, due to a link existing from one of the feed items from the third iteration 56.
The structure of the story 50 is stored within the memory of the metafeed aggregator 12. As further feed items that link to the story are received by the metafeed aggregator 12, links to the new feed items are added to the memory entry for the story 50.
The structure of the story 50 may be ordered according to certain conditions. Basic ordering may be achieved through alphabetical ordering, or through the time of receipt of the individual feed items. However, the main ordering mechanism is preferably performed by more complicated graphing of the story 50.
For example, a story may be composed of a number of related items (related by links or textual references, determined by the link analysis module 30). The metafeed aggregator system is operable to order the story based on two factors - the time flow of feed items, and the popularity of a feed item. Each feed item can be analysed to measure how many of the other feed items in the story link to that particular feed item, with the eventual ranking of the feed item within the story based on a combination of the measured factors.
Among the linked feed items there is the initial feed item, i.e. the feed item that "broke" the story, for example a CNN report on a new medical breakthrough. Other web feeds would follow in their reporting, e.g. say the BBC, MSNBC, Blog A, Blog B, Blog C, and finally the New York Times, each providing a feed item that is linked (whether by links or textual references) that is somehow linked into a story created by the receipt of the initial feed item.
Each of the feed items has an individual timestamp. An ordered graph can be constructed based on the received timestamp, which would result in a story having the CNN feed item first and the New York Times feed item at the end. However, a more qualitative feed ordering system is preferably utilised in conjunction with the timestamp values. This can be based on the popularity of the received feed items.
While the CNN feed item is ranked first due to its timestamp value, say that the three Blog feed items comprise links to the MSNBC story, while not linking to the CNN item. Thus a greater weight is given to the MSNBC item, and correspondingly it can be pushed higher up the rankings for the story. Depending on the weightings employed, the MSNBC item may be promoted above the BBC item and even the CNN item, even though it has a later timestamp. Use of this mechanism allows for a greater weight to be given to items that may have more substance or analysis, as opposed to a ranking based purely on the speed of publication.
Once the link analysis step 110 is completed, and a story group (Story A) is formed, the metafeed aggregator 12 is operable to perform a topic analysis (step 120) using the text analysis module 28.
A topic is a user-created customised search filter that defines what is to be searched for in the analysed story groups. A topic comprises a title (e.g. "Cats"), and a list of keywords and/or search rules (e.g. cat; feline; kitten) associated with the subject matter of the topic. Based on the topic parameters, the text analysis module 28 examines 120 the contents of a story group (i.e. the feed items) for any reference to the keywords of the topic. Standard textual analysis techniques may be employed to determine if a particular story is relevant to a topic. For example, if the number of mentions of a topic keyword in a story group exceeds a pre-defined threshold number, then the story as a whole is regarded as being potentially relevant to that topic.
Once a story is determined as being potentially relevant to a topic, the story is tagged as such. This is accomplished by amending the stored story within the memory of the metafeed aggregator 12 to include metadata referencing the particular topic.
At this point, the feed items have been collected into story groups, and the story groups clustered into different topics. Up to this point, the structure of the material has been determined through computational analysis alone. However, one of the advantages of the system of the present invention is that social computing methods are also implemented in order to improve the quality of the information that is output.
Once the topic analysis 120 has been performed (i.e. when the story groups have been textually analysed for topics), Fig. 3, each story group is then presented to a topic group. A topic group is a selected group of moderators for a particular topic that determines the actual relevance of each story to the topic.
The story distributor module 32 is operable to examine (step 130) the topic metadata present for each individual story group stored in the memory of the metafeed aggregator 12. Based on the topics listed in the metadata, the story distributor module 32 sends the story in question to the relevant topic groups for further analysis.
As the stories are distributed to the different topic groups, they are then presented (step 140) to the topic groups for approval. In the system of Fig. 1, the stories are presented in the form of a webpage 34 generated by the metafeed aggregator 12. Each topic group has a specific webpage dedicated to that particular topic. As seen in Fig. 6, the members 72 of the topic group 70 can view the stories 74 that the computational analysis of the metafeed aggregator 12 has decided are potentially relevant to that topic. The members 72 of the topic group 70 can then vote for or against the story 74 based on the actual relevance of that story to the topic (as interpreted by the individual topic group member).
Turning to Fig. 7, an exemplary template for a topic group webpage is indicated generally at 80. The webpage 80 comprises a header title section 82, indicating for example the origin of the feed aggregation system i.e. the supplier of the service, banner advertising, and/or navigation information; a topic title 84 indicating the current topic; a plurality of story sections 86, showing the stories currently being voted on; and a number of extra options for a user of the webpage 80. The extra options may comprise choices 88 to alter the arrangement of the current topic being displayed (e.g. options to order the stories by a ranking of most popular stories, most recent stories, or stories that may have had most activity without being most popular - i.e. most divisive stories); options to structure 90 the stories displayed by source (e.g. display stories from CNN, BBC News, RTE News, etc.); options to subscribe 92 to the web feed output for the various topics; or links 94 to webpages for other topics.
In the displayed webpage 80, voting options 98a-c, 96 are displayed for the individual stories 86. These story voting options 98a-c, 96 allow a user of the webpage 80 to vote for or against an individual story by allowing the users to vote for or against the individual feed items 98a- c, 96 within each story. It will be understood that, while in the example of Figure 7 individual feed items are voted on by the users of the webpage, alternatively or in addition individual stories can also be rated to allow quicker aggregated ratings of stories. The resultant rating for the story can thus be a function of the ratings received for the story itself, and/or the ratings received for the individual feed items contained in the story. So for example, if it were found that the rating for an item within an otherwise highly rated story was bringing down the story's relevance to a topic by more than a threshold amount, the item even could be removed from the story.
Also, individually rating items within a story allows items not alone to be ranked and displayed by other criteria such as described above, but also according to their relative ratings within a story.
So, individual stories could be displayed with the most popular feed item 98a for that story at the start of that story section, with the remaining feed items 98b, 98c nested beneath the most popular feed item 98a. The feed items 98a-c, 96 can be presented with additional information regarding the title, source, date, and possibly, extracts from the body of the item.
The template of Fig. 7 allows the users of the topic group webpage 80 to efficiently and easily promote the feed items and stories that they believe are of most relevance to the particular topic, and also to down vote any stories or feed items that are deemed not relevant, or of lesser relevance.
It will be understood that the voting schemes in the topic groups can be a simple "Yes/No" selection, or perhaps a sliding scale rating, e.g. 1-10. The members of the topic group can be chosen to be a limited number of selected moderators, the moderators being the only people able to access the relevant topic group website 34, or the topic group website 34 may be available to the public, allowing any Internet user to vote on the relevance of the presented stories.
It will also be understood that the step 140 of providing the stories to the topic group members may be accomplished not alone by providing the topic group members with access to the website 34 but by exposing an API (Application Programming Interface) 66 for the metafeed system 12 to other platforms. With reference to Fig. 8, a sample system is shown wherein the metafeed aggregator system 12 is operable to receive user ratings for stories via client applications running on users desktops 60, mobile clients 62 (e.g. PDAs, cellphones), or from applications running on third party websites 64, as well as from the topic group website 34. It will be understood that IPTV systems, email systems, or text-to-speech systems may also utilise the system 12 via the API 66.
For example, the desktop application 60 receives the selected stories from the metafeed aggregator 12 via the Internet, the application 60 being operable to present a voting interface to its users which then communicates to the metafeed system 12 via the API 66.
Indeed some of the systems 60, 62, 64 may not expose the voting functionality to their users, rather they may simply expose feeds for the various topics generated by the metafeed system 12 to their users.
Similarly, the API can be such that it enables various levels of security to be specified so allowing users of platforms 60, 62, 64 customised levels of access to the system 12.
The ratings that each story receives are recorded by the metafeed aggregator 12, and the results used to determine which stories are most relevant to particular topics. An output topic web feed 38 is then produced for that particular topic, the contents of the web feed 38 being determined by the ratings received by each story, and whether those ratings exceed a topic threshold or not.
The topic threshold is a largely self-setting, floating value, whereby stories with ratings exceeding this threshold are deemed appropriate for the generated feed. A fixed threshold could end up flooding an output topic web feed 38 or it could lead to a drought of feed items. While certain topics can be very active (e.g. a topic dedicated to general sport news), and the threshold would go up to maintain a usable flow-rate of stories, other topics are less active (e.g. a topic dedicated to a particular football team) and the threshold will drop accordingly to maintain the flow-rate. This threshold level can also be adjusted by administrators to ensure that low activity topic groups don't maintain flow-rate of feed items at the expense of quality. The ordering of feed items within an output topic feed 38 depends on a number of factors, including the relevance of the particular story, the timestamp of its particular feed items, the results of the link analysis and topic analysis stages, etc.
The invention is not limited to the embodiment described herein but can be amended or modified without departing from the scope of the present invention.

Claims

Claims
1. A method of providing a customised, user-moderated syndication feed for a topic, the method comprising the steps of: (a) receiving a feed item;
(b) analysing the feed item for associations to previously-received feed items;
(c) grouping associated feed items into a story;
(d) analysing the story for specified keywords relevant to one or more of a number of topics, each topic comprising a pre-defined set of keywords; (e) tagging a story containing specified topic keywords as a story provisionally relevant to said topic;
(f) presenting the tagged story to raters of said topic;
(g) receiving an approval rating for a tagged story from a rater indicating the perceived relevance of said tagged story to said topic; and (h) creating a syndication feed for said topic, the syndication feed comprising a plurality of tagged stories based on the approval ratings of each tagged story.
2. A method as claimed in claim 1, wherein said feed items are one or more of RSS or Atom feed items.
3. A method as claimed in claim 1, wherein the step of analysing the feed item for associations to previously-received feed items comprises analysing the feed item for any embedded links to previously-received feed items.
4. A method as claimed in claim 1, wherein the step of analysing the feed item for associations to previously-received feed items comprises analysing the feed item for any textual links to previously-received feed items.
5. A method as claimed in claim 1, wherein the topic is user-created.
6. A method as claimed in claim 1, wherein the step of tagging a story comprises adding topic meta-data to said story.
7. A method as claimed in claim 1, wherein the step of presenting the tagged story to raters of said topic comprises publishing the tagged story as part of a website.
8. A method as claimed in claim 7, wherein the raters of said topic comprise at least one user of said website.
9. A method as claimed in claim 1, wherein the step of presenting the tagged story to raters of said topic comprises the step of sending the tagged story across a network to an interactive client application operated by a respective rater.
10. A method as claimed in claim 1, wherein the method comprises the step of ordering tagged stories in the created syndication feed based on the approval ratings of each tagged story.
11. A method as claimed in claim 1 , wherein the method comprises the step of ordering tagged stories in the created syndication feed based on the chronological order of receipt of feed items in said story.
12. A method as claimed in claim 1, wherein the method comprises the step of ordering tagged stories in the created syndication feed based on the number of links to the feed items from other feed items.
13. A method as claimed in claim 1, wherein the method comprises the step of ordering feed items in a tagged story in the created syndication feed based on the chronological order of receipt of feed items in said story.
14. A method as claimed in claim 1, wherein the method comprises the step of ordering a feed item in a tagged story in the created syndication feed based on the number of links to the feed item from other feed items.
15. A method as claimed in claim 1 wherein said step of receiving an approval rating for a tagged story comprises: receiving an indication from a rater of the relevance of one or more of said tagged story or a feed item in a tagged story to said topic.
16. A system for providing a customised, user-moderated syndication feed for a topic, comprising:
(a) means for receiving a feed item; (b) means for analysing the feed item for associations to previously-received feed items;
(c) means for grouping associated feed items into a story;
(d) means for analysing the story for specified keywords relevant to one or more of a number of topics, each topic comprising a pre-defined set of keywords; (e) means for tagging a story containing specified topic keywords as a story provisionally relevant to said topic;
(f) means for presenting the tagged story to raters of said topic;
(g) means for receiving an approval rating for a tagged story from a rater indicating the perceived relevance of said tagged story to said topic; and (h) means for creating a syndication feed for said topic, the syndication feed comprising a plurality of tagged stories based on the approval ratings of each tagged story.
17. A computer program product comprising computer readable code for providing a customised, user-moderated syndication feed for a topic which when executed in a computer system is arranged to perform the steps of any of claims 1 to 15.
PCT/EP2008/000761 2007-03-22 2008-01-31 A feed aggregation system WO2008113425A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IES2007/0194 2007-03-22
IE20070194 2007-03-22

Publications (1)

Publication Number Publication Date
WO2008113425A1 true WO2008113425A1 (en) 2008-09-25

Family

ID=39267891

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2008/000761 WO2008113425A1 (en) 2007-03-22 2008-01-31 A feed aggregation system

Country Status (1)

Country Link
WO (1) WO2008113425A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100199184A1 (en) * 2009-01-30 2010-08-05 Yahoo! Inc. Prioritizing vitality events in a social networking system
US20110113359A1 (en) * 2009-11-10 2011-05-12 Microsoft Corporation Model versioning using web feeds
EP2526521A2 (en) * 2010-01-21 2012-11-28 Microsoft Corporation Scalable topical aggregation of data feeds
US8626768B2 (en) 2010-01-06 2014-01-07 Microsoft Corporation Automated discovery aggregation and organization of subject area discussions
US9195771B2 (en) 2011-08-09 2015-11-24 Christian George STRIKE System for creating and method for providing a news feed website and application
US20180330078A1 (en) 2017-05-11 2018-11-15 Microsoft Technology Licensing, Llc Enclave pool shared key
US20180332011A1 (en) 2017-05-11 2018-11-15 Microsoft Technology Licensing, Llc Secure cryptlet tunnel
US10238288B2 (en) 2017-06-15 2019-03-26 Microsoft Technology Licensing, Llc Direct frequency modulating radio-frequency sensors
US10540620B2 (en) 2016-10-31 2020-01-21 Microsoft Technology Licensing, Llc Personalized aggregated project team activity feed
US10635733B2 (en) 2017-05-05 2020-04-28 Microsoft Technology Licensing, Llc Personalized user-categorized recommendations
US10664591B2 (en) 2017-05-11 2020-05-26 Microsoft Technology Licensing, Llc Enclave pools
US10740455B2 (en) 2017-05-11 2020-08-11 Microsoft Technology Licensing, Llc Encave pool management
US10747905B2 (en) 2017-05-11 2020-08-18 Microsoft Technology Licensing, Llc Enclave ring and pair topologies
US11477302B2 (en) 2016-07-06 2022-10-18 Palo Alto Research Center Incorporated Computer-implemented system and method for distributed activity detection

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6148330A (en) * 1997-11-17 2000-11-14 Netscape Communications Corp. System and method for automatically generating content for a network channel
US20060112076A1 (en) * 2004-11-19 2006-05-25 International Business Machines Corporation Method, system, and storage medium for providing web information processing services
US20070061393A1 (en) * 2005-02-01 2007-03-15 Moore James F Management of health care data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6148330A (en) * 1997-11-17 2000-11-14 Netscape Communications Corp. System and method for automatically generating content for a network channel
US20060112076A1 (en) * 2004-11-19 2006-05-25 International Business Machines Corporation Method, system, and storage medium for providing web information processing services
US20070061393A1 (en) * 2005-02-01 2007-03-15 Moore James F Management of health care data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
D.GRUHL ET AL.: "The web beyond popularity: a really simple system for web scale RSS", PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 23 May 2006 (2006-05-23) - 26 May 2006 (2006-05-26), Edinburgh, Scotland, pages 183 - 192, XP002476432, ISBN: 1-59593-323-9, Retrieved from the Internet <URL:http://doi.acm.org/10.1145/1135777.1135809> [retrieved on 20080415] *
KILHONG JOO ET AL: "An Incremental Document Clustering Algorithm Based on a Hierarchical Agglomerative Approach", DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY LECTURE NOTES IN COMPUTER SCIENCE;;LNCS, SPRINGER-VERLAG, BE, vol. 3816, 2005, pages 321 - 332, XP019026646, ISBN: 3-540-30999-3 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100199184A1 (en) * 2009-01-30 2010-08-05 Yahoo! Inc. Prioritizing vitality events in a social networking system
US20110113359A1 (en) * 2009-11-10 2011-05-12 Microsoft Corporation Model versioning using web feeds
US8601440B2 (en) * 2009-11-10 2013-12-03 Microsoft Corporation Using web model feeds to version models which are defined in modeling languages
US8626768B2 (en) 2010-01-06 2014-01-07 Microsoft Corporation Automated discovery aggregation and organization of subject area discussions
EP2526521A2 (en) * 2010-01-21 2012-11-28 Microsoft Corporation Scalable topical aggregation of data feeds
JP2013518322A (en) * 2010-01-21 2013-05-20 マイクロソフト コーポレーション Data feed total that can be adjusted based on topic
EP2526521A4 (en) * 2010-01-21 2014-03-19 Microsoft Corp Scalable topical aggregation of data feeds
US9195771B2 (en) 2011-08-09 2015-11-24 Christian George STRIKE System for creating and method for providing a news feed website and application
US11477302B2 (en) 2016-07-06 2022-10-18 Palo Alto Research Center Incorporated Computer-implemented system and method for distributed activity detection
US10540620B2 (en) 2016-10-31 2020-01-21 Microsoft Technology Licensing, Llc Personalized aggregated project team activity feed
US10635733B2 (en) 2017-05-05 2020-04-28 Microsoft Technology Licensing, Llc Personalized user-categorized recommendations
US10528722B2 (en) 2017-05-11 2020-01-07 Microsoft Technology Licensing, Llc Enclave pool shared key
US20180332011A1 (en) 2017-05-11 2018-11-15 Microsoft Technology Licensing, Llc Secure cryptlet tunnel
US10664591B2 (en) 2017-05-11 2020-05-26 Microsoft Technology Licensing, Llc Enclave pools
US10740455B2 (en) 2017-05-11 2020-08-11 Microsoft Technology Licensing, Llc Encave pool management
US10747905B2 (en) 2017-05-11 2020-08-18 Microsoft Technology Licensing, Llc Enclave ring and pair topologies
US10833858B2 (en) 2017-05-11 2020-11-10 Microsoft Technology Licensing, Llc Secure cryptlet tunnel
US20180330078A1 (en) 2017-05-11 2018-11-15 Microsoft Technology Licensing, Llc Enclave pool shared key
US10238288B2 (en) 2017-06-15 2019-03-26 Microsoft Technology Licensing, Llc Direct frequency modulating radio-frequency sensors

Similar Documents

Publication Publication Date Title
WO2008113425A1 (en) A feed aggregation system
US10180986B2 (en) Extracting structured data from weblogs
US7860878B2 (en) Prioritizing media assets for publication
US9953063B2 (en) System and method of providing a content discovery platform for optimizing social network engagements
US9690830B2 (en) Gathering and contributing content across diverse sources
US7840527B2 (en) Platform for feeds
US7818659B2 (en) News feed viewer
US20170255626A1 (en) Systems and methods for dynamically creating hyperlinks associated with relevant multimedia content
US7421429B2 (en) Generate blog context ranking using track-back weight, context weight and, cumulative comment weight
US20160142502A1 (en) Topical activity monitor and identity collector system
US8495210B1 (en) Predictive publishing of internet digital content
US20150106362A1 (en) Methods, systems, and media for content ranking using real-time data
US20130097152A1 (en) Topical activity monitor system and method
US7631263B2 (en) Methods, systems, and computer program products for characterizing links to resources not activated
US20100131455A1 (en) Cross-website management information system
US20080228574A1 (en) System And Method For Conveying Content Changes Over A Network
US20100287368A1 (en) Method, apparatus and system for hosting information exchange groups on a wide area network
JP2008508575A (en) Aggregation and search methods using ecosystems and related technologies
US10956502B2 (en) Method of and system for recommending fresh search query suggestions on search engine
US20080263439A1 (en) Client application for identification of updates in selected network pages
US9881060B1 (en) Method and system for suggesting messages and accounts from a real-time messaging platform
US20110258177A1 (en) Systems and methods for providing a microdocument framework for storage, retrieval, and aggregation
US20220147551A1 (en) Aggregating activity data for multiple users
De Pessemier et al. Combining collaborative filtering and search engine into hybrid news recommendations
JP2007102635A (en) Blog community recommendation method, system and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08707450

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08707450

Country of ref document: EP

Kind code of ref document: A1