Monday, September 10, 2007

WikiDashboard: Providing social transparency to Wikipedia


WikiDashboard Tool (alpha-release)

We are pleased to announce the release of our first research prototype of a social dynamic analysis tool for Wikipedia called WikiDashboard. This is a quick guide to our social dynamic analysis tool for Wikipedia

Motivation

The idea is that if we provide social transparency and enable attribution of work to individual workers in Wikipedia, then this will eventually result in increased credibility and trust in the page content, and therefore higher levels of trust in Wikipedia.

You might ask "Why would increasing social transparency result in higher quality articles and increase trust?"

Indeed, the quality of the articles in Wikipedia has been debated heavily in the press [here, here, here, here, and let's not forget the Nature magazine debacle].

Wikipedia itself keeps track of these studies and openly discusses them here, which is a form of social transparency itself. However, even Wales himself have has been quoted as saying that "while Wikipedia is useful for many things, he would like to make it known that he does not recommend it to college students for serious research." Indeed, the standard complaint I often hear about Wikipedia is that because of its editorial policy (anyone can edit anything), it is an unreliable source of information.

The opposite point of view, however, has not been debated or expressed nearly as much: Precisely because anyone can edit anything and that anyone can examine the edit history and see who has made them, it will (or has already) become a reliable source of information. I think Michael Scott, the character on the popular TV show "The Office", puts it succinctly: "Wikipedia is the best thing ever. Anyone in the world, can write anything they want about any subject. So you know you are getting the best possible information."

While tongue-in-cheek, it brings up a valid point. Because the information is out there for anyone to examine and to question, incorrect information can be fixed and two disputed points of view can be examined side-by-side. In fact, this is precisely the academic process for ascertaining the truth. Scholars publish papers so that theories can be put forth and debated, facts can be examined, and ideas challenged. Without publication and without social transparency of attribution of ideas and facts to individual researchers, there would be no scientific progress. Therefore, it seems somewhat ironic that the History Department at the Middlebury College have banned its students from citing Wikipedia sources .

Related Work

Indeed, just very recently WikiScanner has brought the issue and idea of social transparency to the forefront. It helps people find out the organizations where anonymous edits in Wikipedia are coming from. A week or two later, WikiRage helps identify the hottest trends in Wikipedia.

From academic works, we have seen interesting work from IBM called History Flow that visualizes the edits to article pages in Wikipedia, and the UCSC Wiki Trust Coloring Demo that demonstrated how trust could be visualized line-by-line. These are all examples of how being able to better understand editing history and editing patterns at a glance could dramatically help users uncover problems and the trustworthiness of contents on Wikipedia.

These tools and other discussions [NYTimes , blogs, and slashdot discussion] are noticing that accountability and transparency appears to be at the heart of the process that helps generate quality articles.

Guide to our tool

The tool can be used just as if you're on the Wikipedia site itself. All of the functions (such as the article search function, and the edit and history tabs) work just as before. The site provides the dashboard for each page in Wikipedia, while proxying the rest of the content from Wikipedia.

Note that we only currently have edit data up until 2007/07/16, so more recent edits are not included in the charts. We're working to fix this.

See our guide for help on understanding the visualizations in the WikiDashboard.

Some Interesting Examples

We will use the 2008 presidential election as an example. In the figure below, we see that the activities on this page has been heating up lately:
2008 US Presidential election
http://wikidashboard.parc.com/wiki/2008_presidential_election



Here are some notable Democractic Party candidates:
Hillary Clinton
http://wikidashboard.parc.com/wiki/Hillary_Rodham_Clinton



John Edwards
http://wikidashboard.parc.com/wiki/John_Edwards



Barack Obama
http://wikidashboard.parc.com/wiki/Barack_Obama



Here are some notable Republican candidates:

Rudy Giuliani
http://wikidashboard.parc.com/wiki/Rudy_Giuliani



John McCain
http://wikidashboard.parc.com/wiki/John_McCain



Ron Paul
http://wikidashboard.parc.com/wiki/Ron_Paul



Summary

We're curious of how the Web community will use this tool to surface social dynamics and editing patterns that might otherwise be difficult to find and analyze in Wikipedia. We are also interested in applying this tool to Enterprise Wikis. Please let us know by leaving a comment on this blog post on patterns you find or questions for us. Alternatively, (if you wish to contact us in private), email us at:
wikidashboard [at] parc [dot] com

Thanks,

Bongwon Suh
Ed H. Chi

Palo Alto Research Center

(joint work with our ex-colleagues Bryan Pendleton, Niki Kittur, now both at CMU)

38 comments:

Ed H. Chi said...

We asked Wikipedia about the potential traffic impact, and they have disabled the ability to directly edit the Wikipedia pages from our proxy server. Users of our system will have to click on the 'Original Document' link to login to Wikipedia to edit the articles.

Ed H. Chi said...

First Blog entry we have noticed at
techpresident

Ed H. Chi said...

Another news source has picked up our tool at The Chronicle of Higher Education.

fred said...

Hi, I have written about WikiDashboard at Unit Structures. I've included some feedback for you, as well.

I'm the same person who blogged about you at techPresident, but I think your tool is really cool and I'm trying to get the word out :)

Keep up the great work!

-Fred

Ed H. Chi said...

Chinese language blog coverage of WikiDashboard!

Kevin Gamble said...

This is very nice! We run a mediawiki install for collaborative work across 75 universities. Is this a tool that could be installed/configured to work on other sites? We would be very interested in using it.

Thank you,

Kevin

Ed H. Chi said...

Kevin: Yes, it is a tool that should be installable on other MediaWikis. If you're interested, please contact our business guy at: lawrence [dot] lee [at] parc [dot] com

Anonymous said...

One question that many people have been asking for quite a while now is — How much of Wikipedia's better quality content has been contributed by editors who are now banned or indefinitely blocked from editing articles on the site? Go tuit!

Anonymous said...

I would like remark on the following statement:

"Because the information is out there for anyone to examine and to question, incorrect information can be fixed and two disputed points of view can be listed side-by-side. In fact, this is precisely the academic process for ascertaining the truth. Scholars publish papers so that theories can be put forth and debated, facts can be examined, and ideas challenged. Without publication and without social transparency of attribution of ideas and facts to individual researchers, there would be no scientific progress."

Though I realize this statement may have been tong in cheek (TIC), it behooves me to flag its laughability against the mere chance that anyone might take it seriously. That laughability is evident to anyone who has participated with a clear head in Wikipedia long enough to see the difference between its "espoused values" and its "enacted values" (Argyris & Schon) and to recognize how it works in practice, as opposed to its image in the belief system of its True Believers.

Disclosure. Jon Awbrey and Jonny Cache are two authorships of the same person.

Ed H. Chi said...

jonny and jon:

Reflecting on your comments made me realize that perhaps that is the reason why we don't see academic research published without creditials attached to the papers.

Interestingly, however, some conferences in academic circles do require papers to be submitted anonymously, and only after acceptance of the paper are the identities of the authors revealed. This is done to protect publication by reputation (and reputation alone).

In either case, a paper is never published without a verifiable identity. In fact, it is now considered unethical to publish without verifiable data. So no matter what, the part of our blog that is not tongue-in-cheek is that transparency is necessary to ensure data and input can be traced and verified. Knowing an article is written mostly by anonymous contributions or by a single author without verifiable authority ought to raise some flags about the accuracy of the article.

Anyway, thanks for the comments.

Anonymous said...

Dear Ed et al.,

Please feel free to join the discussion of your project on this thread at The Wikipedia Review.

Ed H. Chi said...

Mentioned in Washington Post and China Times.

Anonymous said...

Great stuff...

It would be better if the totals are separated between Talk edits and main article edits. As it stands now, both are conflated into one list.

-- Jossi

Anonymous said...

This is really neat. I got here via a mention at the Wikipedia Signpost.

I'll second Jossi above about disaggregating talk edits. I'm listed as one of the top editors to the Main Page, solely due to the posts to Talk:Main Page, which is an entirely different animal. Besides weird exceptions like this, some editors do a lot more discussing than editing, so the result is skewed by adding them together.

BanyanTree

Ed H. Chi said...

Jossi, Banyan Tree: I agree, upon some reflection on your arguments about why the design could be improved. We should have disambiguiated those two edit counts instead of lumping them together. We will try and have an updated version in the near future. Thanks for your comments!

nojhan said...

Interesting tool, do you plan to make it available for other languages or to suggest its inclusion on MediaWiki ?

Anonymous said...

Ed,

Let first apologize for the excessive brevity of my initial remark — I assure you, that seldom happens!

By way of further explanation, then, let me give a concrete example of what I mean.

If we look at the Wikidashboard display for the Wikipedia article on Charles Sanders Peirce
, we see that 33.9% of the edits are by the editor Jon Awbrey, and a check of the corresponding user page shows that the editor in question is now banned or indefinitely blocked, depending on your definitions and who you ask.

There are many similar questions that might be asked about different subsets of the editor population, whose answers could be visualized by highlighting the rows of the Dashboard display with distinctive tinctures of various sorts.

Ed H. Chi said...

nojhan: we do have plans to get the tool working on the other language Wikipedias, so stay tuned. (It isn't trivial, since we have to download the data from the foundation and then run our various counting algorithms on the data, and the data can be huge!)

Jonny Cache: Yes, we thought of similar ideas to label the number of reverts a user received/given, whether the person is an admin, whether the user is banned, etc, but we just haven't gotten to it. Thanks for the suggestion, and we'll try to roll it out in the next version.

BTW, the WashingtonPost article that mentioned us briefly is now being syndicated at other news outlets, such as the SJ Mercury.

Petter said...

Very interesting project! BTW, had this blog been a wiki, I would have corrected "However, even Wales himself have been quoted as saying that" to "...Wales himself has been quoted.."
Cheers!

Ed H. Chi said...

Oslo IRC: Many thanks for the grammar check there. I've fixed the post with a strike-thru.

Ed H. Chi said...

Liam from the Wikipedia Weekly podcast writes to us:

The episode is now published and downloadable in MP3 and .ogg format. Thanks very much for your time and your thoughts.

Wikipedia Weekly episode32

http://www.wikipediaweekly.com

I look forward to hearing from you in the future,
best,
Liam

Ed H. Chi said...

ResearchBuzz covered us

Gregory Kohs said...

Just wanted to add my two cents. This is a great add-on to Wikipedia for those of us who come at it with an ABF attitude -- "assume bad faith". It would be fantastic to see the User:XYZ pages have a "see more pages" feature. Only seeing the top 10 contributions of, for example, User:JzG doesn't allow us to see if his logged-in activity shows the same proclivity for large-busted women as his not-so-anonymous IP address editing has shown to be very much the case. I want the top 100 pages Guy Chapman has edited!

Ed H. Chi said...

Gregory: You're right that our choice of 5, 10, or 20 results per dashboard list is pretty arbitrary. We're working on enhancements to the tool so that you can see more in the near future. Stay tuned.

Regarding your comment about "ABF" (assume bad faith), part of the idea behind WikiDashboard is not to place any judgement about the appropriateness of the edits. Rather, the idea is to make the activity more transparent so that people can make value judgements on their own.

Anonymous said...

The dark site of the blossoming free
speech on the web is this:

http://www.delawareonline.com/apps/pbcs.dll/article?AID=/20070423/BUSINESS/704230309/1003%0A

A growing # of employees get fired for their online activities. Freedom
of speech doesn't protect people much in the corporate world.

Interesting experiments are starting to challenge that like RateMyBossCafe.com

But sadly the tightened corp policies may suffocate such experiments. A balance and clarity are badly needed to protect freedom of speech on the web.

Ed H. Chi said...

http://www.lkozma.net/wpv/index.html

This is a pretty simple mashup between the recent changes data on Wikipedia and Google Maps, but it is just very cool and interesting to watch.

Unknown said...

Are you sure this wikidashboard is accurate? I found cases where the numbers from wikidashboard are not matching with the wannabe_kate tool.

Ed H. Chi said...

Conner, we only have data up to July 2007, so we don't have a latest counts. However, we just recently obtained a toolserver account on Wikipedia, and hope to have a live version with live numbers up real soon.

Anonymous said...

You need to distinguish between quality and quantity of edits. Allow us to list people's edits by the amount of changes they did. I generally try to fit a lot of changes in single edits.

In order to reward good Wikipedia edits, you have to separate the copyeditors from the people who contribute content.

Ed H. Chi said...

OptimistBen:
You are absolutely correct. We haven't yet find an easy method to identify the quality of the edit computationally across all 100+ million revisions recorded by Wikipedia. If we can come up with one, that'd definitely be the next thing we try.

Nicholas Barry said...

I have only two words for you: Firefox Add-on! You need a firefox add-on. I'm not really interested in cruising to your site every time I end up on wikipedia - google searches take me there all the time, and I'd like to be able to see your excellent work any time at all. It would be different if I went to Wikipedia with intent, and stayed there for a period of time, but that isn't how I browse.

With a firefox extension, lots of people could view your visualization of the edit history on any page, any time. The community that uses firefox extensions is very forward-thinking, and would probably adopt en masse. You've already shown that lots of people are interested in the idea, given all the media attention you've earned, and this would launch you yet further. Please!

Ed H. Chi said...

We actually had a plugin for firefox at one point in time, and we might be able to resurrect that piece of code.

In the mean time, you might be interested in using our wikidashboard bookmarklet for now:
http://asc-parc.blogspot.com/2007/12/bookmarklet-for-wikidashboard.html

Nicholas Barry said...

Thanks, that bookmarklet does nearly everything I'd expect in an add-on, and is much easier for you folks to create. Awesome!

Ed H. Chi said...

We added this as a link to our home page on WikiDashboard, and will start thinking about revamping that page to make it look a little more professional and feature the bookmarklet a little more prominently. Thanks for reminding us to put that link up.

Unknown said...

I've been using this greasemonkey script and it recently stopped working for me. Any idea what's wrong?

Ed H. Chi said...

@ Disavian
Sorry about the problem! The database server we rely on (called toolserver in Europe) is being updated and upgraded. This should be finished in a few days, and everything hopefully will be back to normal.

It's great to know you're using our system.

Unknown said...

You're welcome! I've got close to 30k edits, and I find it's useful to identify the article's primary editors when things happen (FAC nom, etc).

It's also helpful to see which articles I'm the primary contributor to :)

Did you know about that greasemonkey script? I don't see any mention of it on your site, but I didn't look particularly hard.

Ed H. Chi said...

yes, we knew about the greasemonkey script, but you're right, we have not put a link up on our site to it. We should.... I'll let Bongwon know.