Tuesday, May 13, 2008

Yahoo! Answer vs. Google+Wikipedia vs. Powerset

One of the great things about the Web is that all this knowledge that is socially constructed and co-created can be easily searched. The PageRank algorithm (based loosely on a collective voting and averaging mechanism around links) is probably responsible for a huge amount of productivity gain in the entire world and also satisfies a lot of curiosities (e.g. Is 'watermelon' a melon?) It is no surprise, therefore, that Web2.0 systems would try to build upon this success to see how knowledge sharing and information foraging can be improved.

An old trick, tried in Web1.0 days, is to use human-powered answers. The poster child these days in this area appears to be Yahoo! Answers. A more recent technique is socially-constructed collections and encyclopedias, notably represented by Wikipedia (but older systems like about.com, Open Directory Project are still around). The newest of the bunch is semantic-powered search engines like Powerset. Each one has its own property that makes it interesting as a solution. [Disclaimer: Powerset spun out with PARC technology.]

Powerset, with its meaning-based approach, tries to solve an AI-hard problem of interpreting the question and tries to come up with the best possible answer, but it is currently plagued by coverage and scalability issues. For example, I asked it about the "worst dictators in history", and I got less than satisfactory answers because it hasn't crawled the whole web, searching only Wikipedia at the moment.

There is no guarantee that your question is covered by the content in Wikipedia, but traditional search techniques have the advantage of letting you know whether the information exists at all inside the knowledge base (assuming you know how to formulate the query). I used Google to search within Wikipedia (because Wikipedia's own search doesn't work all that well) for the same dictator question above, and found rather good answers. However, this required me knowing how to use the "site:" advanced search option---something that regular users might not know how to do. BTW, interestingly, Wikipedia's "Dictator" page pointed to this parade.com page on a list of dictators. So it appears that socially-constructed knowledge sources at least gets to close to the answer. The current difference between Google+Wikipedia and Powerset appears to be Powerset's claim to make query formulation a problem of the past.

Yahoo! Answers gave me a set of answers that sometimes was more entertaining than informative. Some apparently think of George W. Bush as a dictator---an interesting and controversial perspective. In either case, users were engaged in a kind of debate.

Each solution probably has its place in the future. While Yahoo! Answers have obvious problems with accuracy (as discussed in this Slate.com article), its sociability makes it entertaining, and we know that sometimes users care more about getting attention to their questions than good answers.

The Answer Garden papers from Ackerman’s work tells us that what is wrong with Yahoo! Answers is that a garden of answers doesn’t really get built up over time. True knowledge aggregation doesn't really happen on Yahoo! Answer, and this appears to not have been its main design goal. We also know from Ackerman’s work that askers really care about two things: getting answers to their questions (1) quickly, and (2) accurately. Perhaps Yahoo! Answers gets to (1) but not (2). But it does get to a third thing, (3) social entertainment.

What I find interesting is how each one of these environments perform on different dimensions around coverage, accuracy, and sociability. Powerset still has to prove itself with coverage issues, and Wikipedia is still expanding and the community is still improving its accuracy metrics and procedures. Might they coverage to a single all-powerful knowledge tool in the future? Google's Knol and Universal Search is a tacit nod to this convergence in the near future.


Stephen Smoliar said...

Ed, I suspect that any social perspective on answering questions needs to begin by classifying questions of opinion from more "objective" questions. Ranking dictators will always be a matter of opinion! I took a more objective approach in trying to put Powerset through its paces. This led to three posts on my own blog:
. I have also been struck by the variation in Wikipedia content with respect to knowledge domain. My blog has a "Wikipedia" Label, where I occasionally try to document these thoughts.

Ed H. Chi said...

I understand that ranking of dictators will be a matter of opinion, but I wasn't looking for the "best" list, I was simply looking for _a_ list. Semantic search engines should get me to a list.

Ed H. Chi said...


The snippet in the third result from Yahoo's search engine told us that Mozart wrote 41 symphonies. Interestingly enough, it was an answer from Yahoo Answers!:


Tim Finin said...

I tried the questions on askwiki:

Q: Who were the worst dictators in history

A: The term "dictator" did not originally possess the odious connotations that it later acquired (compare the change of meaning of the ancient Greek concept of the tyrant, or that of the Roman military title of Imperator).

Q:how many symphonies did mozart write

A:Later significant Viennese composers of symphonies include Johann Baptist Vanhal, Karl Ditters von Dittersdorf and Leopold Hoffmann. The most important symphonists of the latter part of the 18th century are Joseph Haydn, who wrote at least 108 symphonies over the course of 36 years (Webster and Feder 2001), and Wolfgang Amadeus Mozart, who wrote at least 56 symphonies in 24 years (Eisen and Sadie 2001).