Thursday, June 21, 2007

Lada Adamic visits PARC and the ASC Project

Lada Adamic

Today, Lada Adamic came to PARC and give a talk on the identification of expertise networks in discussion forums. Her talk provoked a lot of discussion and thoughts about future research in this area.

Her abstract and title information are below:

Expertise Networks in Online Communities: Structure and Algorithms

Web-based communities have become an important place for people to seek and share expertise. We find that networks in these communities typically differ in their topology from other online networks such as the World Wide Web. Systems targeted to augment web-based communities by automatically identifying users with expertise, for example, need to adapt to the underlying interaction dynamics. In this study, we analyze the Java Forum, a large online help-seeking community, using social network analysis methods. We test a set of network-based ranking algorithms, including PageRank and HITS, on this large size social network in order to identify users with high expertise. We then use simulations to identify a small number of simple rules governing the question-answer dynamic in the network. These simple rules not only replicate the structural characteristics and algorithm performance on the empirically observed Java Forum, but also allow us to evaluate how other algorithms may perform in communities with different characteristics. We believe this approach will be fruitful for practical algorithm design and implementation for online expertise-sharing communities.

This is joint work with Jun Zhang and Mark Ackerman at the School of Information at the University of Michigan.

In her talk, I found a quote that's worth keeping around. Referring to Yahoo! Answers, Eckart Walther said:
[it is] the next generation of search ... [it] is a kind of collective brain -- a searchable database of everything everyone knows. It's a culture of genrosity. The fundamental belief is that everyone knows something.
- Eckart Walther (Yahoo research)

Of course, this has great connection with Wikipedia and the answers it provides too, so these kinds of ideas are at the center of several research projects here at PARC, including our characterization studies of Wikipedia (see previous blog entries).

Lada's work here, in a nutshell, is using some simple methods to identify the expertise level of users in a discussion forums, by looking at the social network formed by the answer/question pairs. It turns out that simple algorithms that rely on simple measures of # of answers provided works nearly as well as sophisticated algorithms such as PageRank or HITS algorithm. She and her co-workers measured this by looking at the data in the Java Forum.

Some of the most interesting discussion revolved around the understanding of micro-economics of behavior. If it is known to users in the community that # of answers or replies will get them a high rank, they might game the system by replying with minimal irrelevant content. We have seen this kind of behavior in Wikipedia as well. If we were to align the incentives in one way, users are likely to game the system along those incentives. How do we design social systems, then, knowing the user behaviors that might follow certain micro-economic predictions?

On a side note, she recently won the vote on for being a sexy geek!

Thursday, June 14, 2007

social networking websites as platforms

This article in WSJ finally points to the fact that social networking sites are not just for connecting friends and track of who is dating whom, but they can be developed into platforms for delivery of contents and ads. To sum it up quickly, "It is a channel, stupid."

WSJ article on social networking sites as delivery channels

WSJ seems to be paying great attention to this new platform, perhaps because it can make a huge difference in business to have social networks and social computing built directly into the delivery channel of contents. Their recent article on how social computing is making an inroad into research universities is a good example of how the trend toward Augmented Social Cognition research appears to be unstoppable at this point.

WSJ article on social computing in research universities

Social tag spamming in

Last summer, we looked at spamming behaviors in social tagging networks such as Recently, some interesting ancedote of spamming in music social tagging data have surfaced:

The blog post above discuss how vandalism is affecting a particular tag at It seems that basic human desires to work in anti-social ways occurs in many social Web2.0 systems. Of course, data mining experts and others have worked tirelessly to come up with algorithms that filter out these 'noise', but I can't help but wonder if these 'noises' are just as valuable as 'real data' in understanding human behavior. Moreover, these outliers seems to point to real data that we could extract and potentially use.