Wednesday, April 21, 2010

Short and Tweet: Experiments on Recommending Content from Information Streams (Specifically, Twitter)

Information streams have recently emerged as a popular means for information awareness. By information streams we are referring to the general set of Web 2.0 feeds such as status updates on Twitter and Facebook, and news and entertainment in Google Reader or other RSS readers. More and more web users keep up with newest information through information streams. At the CHI2010 conference, we presented a new system called Zerozero88.com that recommends contents (particularly URLs that have been posted in Twitter) to users based on their profile on Twitter. Through recommender systems, we hope to better direct user attention to the most interesting URLs that are posted on Twitter that the user should pay attention to.

As a domain for recommendation, information streams have three interesting properties that distinguish them from other well-studied domains:
  1. Recency of content: Content in the stream is often considered interesting only within a short time of first being published. As a result, the recommender may always be in a “cold start” situation, i.e. there is not enough data to generate a good recommendation.
  2. Explicit interaction among users: Unlike other domains where users interact with the system as isolated individuals, with information stream users explicitly interact by subscribing to others’ streams or by sharing items.
  3. User-generated content: Users are not passive consumers of content in information streams. People are often content producers as well as consumers.

In a modular approach, we explored three separate dimensions in designing such a recommender: content sources, topic interest models for users, and social voting:
  1. Content Sources: Given limited access to tweets and processing capabilities, our first design question is how to select the most promising candidate set of URLs to consider for recommendations. We chose two strategies: First, Sarwar et al. [1] have shown that by considering only a small neighborhood of people around the end user, we can reduce the set of items to consider, and at the same time expect recommendations of similar or higher quality.

    Second, we also considered a popularity-based URL selection scheme. URLs that are posted all over Twitter are probably more interesting than those rarely mentioned by anyone.

  2. Topic Modeling: Using topic relevance is an established approach to compute recommendations. The topic interest of a user is modeled from text content the user has interacted with before, and candidate items are ranked by how well they match the topic interest profile of the user. Another way to model the user's interest is by modeling the topics of the tweets made by the people she follows.
  3. Social Voting: Assuming the user has a stable interest and follows people according to that interest, people in the neighborhood should be similar minded enough so that voting on the neighborhood can function effectively. However, the “one person, one vote” basis in the approach above may not be the best design choice in Twitter, because some people may be more trustworthy than others as information sources. Andersen et al. discussed several key insights in their theory of trust-based recommender systems [2], one of which is trust propagation. Intuitively, trust propagation means my trust in Alice will increase when the people whom I trust also show trust in Alice. Following this argument, a person who is followed by many of a user’s followees is more trustworthy as an information source, and thus should be granted more power in the voting process.


The figure below describes the overall design of the system. The URL Source selectors from the lower left are content items that feed into the system to be ranked. The left side of the system does the topic modeling, which can come from either the user's own tweets, or the followee's tweets. The social voting model is implemented using modules on the right.



We implemented 12 recommendation engines in the design space we formulated above, and deployed them to a recommender service on the web to gather feedback from real Twitter users. The best performing algorithm improved the percentage of interesting content to 72% from a baseline of 33%.

Overall, we found that:
  1. The social voting process seems to contribute the most to the recommender accuracy.
  2. The topic models also contribute to the accuracy, but modeling using the user's self tweets is more accurate (with the caveat that the user actually tweets, not merely listen by following people).
  3. Selecting URLs based on the neighborhood seems to work better than globally popular URLs, but the results are not yet statistically significant.
  4. The best performing algorithm is FoF-Self-Vote (that is, using the neighborhood for URL content sources, self-tweets for topic modeling, and social voting.)



You can try out the beta system at http://zerozero88.com, but since it is still in beta, we can probably only enable the accounts of a limited number of people who sign up.

You can also read more about our results in the published paper [3].

Update 2010-08-23: Slides available here.

References

[1] Sarwar, B.M., Karypis, G., Konstan, J.A., Riedl, J. 2002. Recommender systems for large-scale ECommerce: Scalable neighborhood formation using clustering. In Proc of ICCIT 2002.

[2] Andersen, R., Borgs, C., Chayes, J., Feige, U., Flaxman, A., Kalai, A., Mirrokni, V., and Tennenholtz, M. 2008. Trust-based recommendation systems: an axiomatic approach. In Proc of WWW ‘08.

[3] Chen, J., Nairn, R., Nelson, L., Bernstein, M., and Chi, E. 2010. Short and tweet: experiments on recommending content from information streams. In Proceedings of the 28th international Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA, April 10 - 15, 2010). CHI '10. ACM, New York, NY, 1185-1194. DOI= http://doi.acm.org/10.1145/1753326.1753503

Monday, April 12, 2010

Information Stream Overload

Information overload is a growing threat to the productivity of today’s knowledge workers, who need to keep track of multiple streams of information from various sources. RSS feed readers are a popular choice for syndicating information streams, but current tools tend to contribute to the overload problem instead of solving it.  Ironic, isn't it?

A significant portion of the ASC team is here in Atlanta to present work related to this information overload problem, and I will blog about it in the next week or so.

Tomorrow, we will be presenting a paper on FeedWinnower, an enhanced feed aggregator that helps readers to filter feed items by four facets (topic, people, source, and time), thus facilitating feed triage. The four facets corresponds to the What, When, Who, and When questions that govern much information architecture design.  The combination of the four facets provides a powerful way for users to slice and dice their personal feeds.

First, a topic panel allows users to drill down into the specific topics that she might be interested in:


Second, a people panel allows filtering on the source of the person who created the information item in the stream:


Third, a source panel allows filtering of the type of information stream the item came from:


And finally, a time panel allows filtering for a particular time period that you might be interested in out of the information stream:



Usage Scenarios
By combining the four facets, users can examine and navigate their feeds, deciding what items to skip and what to read. Here we give two illustrative real-world scenarios.

Scenario 1: At the end of a workday, Mary opens FeedWinnower to get a sense of what has been happening around her. Using the time facet, she finds out that 507 items came into her account earlier in the day. Glancing at the topic facet, she sees “iphone” and a few other topics being talked about. As she clicks on “iphone”, the right screen shows only 7 items after filtering out other items. In the people facet, she identifies that these 7 items came from 4 of her friends and decides to read those items in detail.

Scenario 2: John wants to find out what his friends have been chatting about on Twitter lately. He selects “Twitter” in the source facet and chooses “yesterday” in the time facet. This yields 425 items. In the people facet, he then excludes those creators that he wants to ignore, filtering down to 324 items. Looking at the topic facet, he sees “betacup” and wonders what it is about. After clicking on “betacup” and reading the remaining 7 items, he now has a fair understanding about the term “betacup”.
In these two scenarios, we see how the four facets enable users to construct simple queries to accomplish their needs. We also see how the topic facet is essential in obtaining an overview of the topical trends in the feeds and helping users to decide what is worth reading in depth.

The paper reference is:
Hong, L., Convertino, G., Suh, B., Chi, E. H., and Kairam, S. 2010. FeedWinnower: layering structures over collections of information streams. In Proceedings of the 28th international Conference on Human Factors in Computing Systems(Atlanta, Georgia, USA, April 10 - 15, 2010). CHI '10. ACM, New York, NY, 947-950. DOI= http://doi.acm.org/10.1145/1753326.1753466