As a domain for recommendation, information streams have three interesting properties that distinguish them from other well-studied domains:
- Recency of content: Content in the stream is often considered interesting only within a short time of first being published. As a result, the recommender may always be in a “cold start” situation, i.e. there is not enough data to generate a good recommendation.
- Explicit interaction among users: Unlike other domains where users interact with the system as isolated individuals, with information stream users explicitly interact by subscribing to others’ streams or by sharing items.
- User-generated content: Users are not passive consumers of content in information streams. People are often content producers as well as consumers.
In a modular approach, we explored three separate dimensions in designing such a recommender: content sources, topic interest models for users, and social voting:
- Content Sources: Given limited access to tweets and processing capabilities, our first design question is how to select the most promising candidate set of URLs to consider for recommendations. We chose two strategies: First, Sarwar et al. [1] have shown that by considering only a small neighborhood of people around the end user, we can reduce the set of items to consider, and at the same time expect recommendations of similar or higher quality.
Second, we also considered a popularity-based URL selection scheme. URLs that are posted all over Twitter are probably more interesting than those rarely mentioned by anyone.
- Topic Modeling: Using topic relevance is an established approach to compute recommendations. The topic interest of a user is modeled from text content the user has interacted with before, and candidate items are ranked by how well they match the topic interest profile of the user. Another way to model the user's interest is by modeling the topics of the tweets made by the people she follows.
- Social Voting: Assuming the user has a stable interest and follows people according to that interest, people in the neighborhood should be similar minded enough so that voting on the neighborhood can function effectively. However, the “one person, one vote” basis in the approach above may not be the best design choice in Twitter, because some people may be more trustworthy than others as information sources. Andersen et al. discussed several key insights in their theory of trust-based recommender systems [2], one of which is trust propagation. Intuitively, trust propagation means my trust in Alice will increase when the people whom I trust also show trust in Alice. Following this argument, a person who is followed by many of a user’s followees is more trustworthy as an information source, and thus should be granted more power in the voting process.
The figure below describes the overall design of the system. The URL Source selectors from the lower left are content items that feed into the system to be ranked. The left side of the system does the topic modeling, which can come from either the user's own tweets, or the followee's tweets. The social voting model is implemented using modules on the right.
We implemented 12 recommendation engines in the design space we formulated above, and deployed them to a recommender service on the web to gather feedback from real Twitter users. The best performing algorithm improved the percentage of interesting content to 72% from a baseline of 33%.
Overall, we found that:
- The social voting process seems to contribute the most to the recommender accuracy.
- The topic models also contribute to the accuracy, but modeling using the user's self tweets is more accurate (with the caveat that the user actually tweets, not merely listen by following people).
- Selecting URLs based on the neighborhood seems to work better than globally popular URLs, but the results are not yet statistically significant.
- The best performing algorithm is FoF-Self-Vote (that is, using the neighborhood for URL content sources, self-tweets for topic modeling, and social voting.)
You can try out the beta system at http://zerozero88.com, but since it is still in beta, we can probably only enable the accounts of a limited number of people who sign up.
You can also read more about our results in the published paper [3].
Update 2010-08-23: Slides available here.
References
[1] Sarwar, B.M., Karypis, G., Konstan, J.A., Riedl, J. 2002. Recommender systems for large-scale ECommerce: Scalable neighborhood formation using clustering. In Proc of ICCIT 2002.
[2] Andersen, R., Borgs, C., Chayes, J., Feige, U., Flaxman, A., Kalai, A., Mirrokni, V., and Tennenholtz, M. 2008. Trust-based recommendation systems: an axiomatic approach. In Proc of WWW ‘08.
[3] Chen, J., Nairn, R., Nelson, L., Bernstein, M., and Chi, E. 2010. Short and tweet: experiments on recommending content from information streams. In Proceedings of the 28th international Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA, April 10 - 15, 2010). CHI '10. ACM, New York, NY, 1185-1194. DOI= http://doi.acm.org/10.1145/1753326.1753503