Tuesday, May 13, 2008

Yahoo! Answer vs. Google+Wikipedia vs. Powerset

One of the great things about the Web is that all this knowledge that is socially constructed and co-created can be easily searched. The PageRank algorithm (based loosely on a collective voting and averaging mechanism around links) is probably responsible for a huge amount of productivity gain in the entire world and also satisfies a lot of curiosities (e.g. Is 'watermelon' a melon?) It is no surprise, therefore, that Web2.0 systems would try to build upon this success to see how knowledge sharing and information foraging can be improved.

An old trick, tried in Web1.0 days, is to use human-powered answers. The poster child these days in this area appears to be Yahoo! Answers. A more recent technique is socially-constructed collections and encyclopedias, notably represented by Wikipedia (but older systems like about.com, Open Directory Project are still around). The newest of the bunch is semantic-powered search engines like Powerset. Each one has its own property that makes it interesting as a solution. [Disclaimer: Powerset spun out with PARC technology.]

Powerset, with its meaning-based approach, tries to solve an AI-hard problem of interpreting the question and tries to come up with the best possible answer, but it is currently plagued by coverage and scalability issues. For example, I asked it about the "worst dictators in history", and I got less than satisfactory answers because it hasn't crawled the whole web, searching only Wikipedia at the moment.

There is no guarantee that your question is covered by the content in Wikipedia, but traditional search techniques have the advantage of letting you know whether the information exists at all inside the knowledge base (assuming you know how to formulate the query). I used Google to search within Wikipedia (because Wikipedia's own search doesn't work all that well) for the same dictator question above, and found rather good answers. However, this required me knowing how to use the "site:" advanced search option---something that regular users might not know how to do. BTW, interestingly, Wikipedia's "Dictator" page pointed to this parade.com page on a list of dictators. So it appears that socially-constructed knowledge sources at least gets to close to the answer. The current difference between Google+Wikipedia and Powerset appears to be Powerset's claim to make query formulation a problem of the past.

Yahoo! Answers gave me a set of answers that sometimes was more entertaining than informative. Some apparently think of George W. Bush as a dictator---an interesting and controversial perspective. In either case, users were engaged in a kind of debate.

Each solution probably has its place in the future. While Yahoo! Answers have obvious problems with accuracy (as discussed in this Slate.com article), its sociability makes it entertaining, and we know that sometimes users care more about getting attention to their questions than good answers.

The Answer Garden papers from Ackerman’s work tells us that what is wrong with Yahoo! Answers is that a garden of answers doesn’t really get built up over time. True knowledge aggregation doesn't really happen on Yahoo! Answer, and this appears to not have been its main design goal. We also know from Ackerman’s work that askers really care about two things: getting answers to their questions (1) quickly, and (2) accurately. Perhaps Yahoo! Answers gets to (1) but not (2). But it does get to a third thing, (3) social entertainment.

What I find interesting is how each one of these environments perform on different dimensions around coverage, accuracy, and sociability. Powerset still has to prove itself with coverage issues, and Wikipedia is still expanding and the community is still improving its accuracy metrics and procedures. Might they coverage to a single all-powerful knowledge tool in the future? Google's Knol and Universal Search is a tacit nod to this convergence in the near future.

Monday, May 5, 2008

Announcing a new release of WikiDashboard with updated dataset

Reputation systems are deeply important to social websites. For example, many users use Facebook or bookmarking systems to insert themselves in the middle of information flow, thus gaining positions as information brokers.

A recent Scientific American article highlighted recent research on the effects of reputation in the brain. The fMRI studies cited showed that "money and social values are processed in the same brain region". Thanks goes to Bob Vasaly for pointing this research out to me.

Indeed, one of the intended uses of WikiDashboard was the ability for readers and editors alike to assess the reputation and behaviors of editors in the system. For example, we can take a look at the actual behavior of a controversial editor named Griot that was at the center of a controversy in the SF Weekly, and make decisions on our own about the actual patterns of edits depicted there. Or take as another example of Jonathan Schilling, who "protects Hillary's online self from the public's hatred. He estimates that he spends up to 15 hours per week editing Wikipedia under the name "Wasted Time R"--much of it, these days, standing watch over Hillary's page."

Our goal here is not to make decisions for you, but to make the social and editing patterns available to the community so that you can make decisions on your own. In an effort to do that and in preparation for the CHI2008 conference, Bongwon recently updated the Wikipedia database and we now have fresh data to share with the community. The new database now consist of nearly 3.5 terabytes of raw revision data that we process.

The new interface also has a connection to reddit.com so that users can submit interesting WikiDashboard views that they have found interesting.

Let us know what you all think!

