Monday, January 26, 2009

Governing and authorship models at Wikipedia and Britannica

Elsewhere, we have spoke about the complex and interesting governing and authorship model in Wikipedia. How counter-intuitive is it that a model like "anyone can edit anything they want" could produce a useful information resource?!

We have conducted some characterizations of the social dynamics within this community, and tracked its changes over time. Interestingly, in the last few days, both Wikipedia and Britannica have been in the news for debates on their stance of the authorship and editorial model.

First, on Jan 24th, we learned from the BBC that the president of Britannica wrote a blog entry in which he outlined a new plan at Britannica to enable readers as well as more experts and editors to help expand and maintain the articles. While not naming Wikipedia by name, it was a clear nod toward a more collaborative relationship Britannica will have with its readers. Specifically, in the blog entry, Jorge Cauz says that "We believe that the creation and documentation of knowledge is a collaborative process but not a democratic one." Most would agree that, in the past, the collaborative process that Britannica had was much more restrictive, and now they seem to have decided to open the door wider to include more people in the editorial process.


Then, today, we learned, also from BBC, that Jimmy Wales have caused a huge stir at Wikipedia for suggested a more restrictive approach to the editing process. He now believes that Wikipedia should follow a model in which edits from anonymous users have to be vetted by one of the site's editors before becoming live.

Apparently, the heated debate is now spreading, and is being mentioned as a big news item on the Yahoo! front page after being written up by AFP. So here we have a system that has been extremely liberal with its editorial policy moving toward a more restrictive authorship model.

So what gives? Is there a right way or wrong way to constructing and compiling knowledge resources? As designers of social systems, what should be the governance model for these systems?

For one thing, we still know awfully little about the social dynamics in these large social systems. We have been quoted in the past that our characterization models of editors show that the top 1% of the editors in Wikipedia generates 50% of the edits. While that is true, the other 50% is being generated by the other 99% of the editors. This other 50% is just as important as the first 50%!

We have been recently conducting some additional research to understand class structures in Wikipedia. We already know that the distribution of editors and their frequency of edits in Wikipedia is a classic power law curve. In order to understand editors through out this distribution, we first ranked editors by their edit frequency, and then divided all of the edits into four quarters, according to this sort.

For one month worth of edit data, there are about 220 editors that are at the very top of the pyramid. These top (most frequent) editors produce the first quarter (25%) of the edits. The next 25% of the edits come from about 1000 editors. While the 3rd quarter of edits come from about 4000 editors, and the last quarter comes from about 15000 editors.

So now the research question is whether you want to design your editing policy to favor the upper class (top editors and administrators), the middle class (the 5000-6000 editors who contribute the middle 50% of all edits), or the lower class (the 15000 editors who contribute the last 25%).

One way to think about this problem is to study the amount of resistance each of these four classes of editors experience on Wikipedia. A metric that we used is the reverts-to-edits ratio. That is, on average, what percentage of edits were reverted, as experienced by each of these four classes of editors? Turns out that the reverts-to-edits ratio for each of these 4 classes of editors were 1.3%, 1.4%, 1.5%, and 4.7%, respectively. Meaning that the lower class of editors clearly experience greater resistance, such that, on average, 1 out of every 20 edits they contribute are reverted. Moreover, the resistance they have experienced have generally increased over time (from about 3% in early 2006 to 5-6% in 2007-2008, and back down to around 5% in late 2008).

So, even without the "flagged revision" mechanism such as the ones suggested by Jimmy Wales, it has already been getting harder for the lowest class of occasional editors to produce edits that remain as contribution in Wikipedia.

The AFP article points to the fact that the debate over the policy came about because of vandalism on Ted Kennedy's page, which had falsely suggested he died after suffering a collapse at a lunchon during Obama's inauguration. But apparently this was corrected within minutes, suggesting that the current system is still correcting most mistakes quite rapidly. Moreover, after I did some sleuthing in the editing history, it appears that the original vandalism edit was done by a registered user named "Gfdjklsdgiojksdkf", and not an anonymous user.

So, it is unclear to me that the current system is not working. Are we fixing something that isn't broken (at least not yet)?

6 comments:

Jon Awbrey said...

A pilot study of Wikipedia Vandalism on the bios of the 100 U.S. Senators can be found here:

Wikipedia Vandalism Study

Obviously, many more such studies need to be conducted.

Jon Awbrey

Ed H. Chi said...

Jon: Absolutely fascinating study. I've enjoyed reading it. I wonder about your comment: "why not extend semi-protection to all biographies of living persons?" The statement led me to think about a policy of semi-protection for any article that somehow cross over an attention threshold. This can be done automatically and easily via both view and edit statistics.

Jon Awbrey said...

Ed,

Re: "Why is this level of protection not extended to all biographies of living persons on Wikipedia?"

Gregory Kohs and several other members of The Wikipedia Review carried out the Senator Bio study, so that statement is due to them, but it echoes one of many recommendations often heard at the Review, where the whole issue of "BLPs" is one of the most discussed topics.

Cf. Wikipedia Review : Biographies Of Living Persons Forum

Jon Awbrey

Jon Awbrey said...

Ed,

Re: "So what gives? Is there a right way or wrong way to constructing and compiling knowledge resources? As designers of social systems, what should be the governance model for these systems?"

Questions like these have been driving several fronts of my own studies since the mid 1980's when I attended my first colloquia on the hot new topic of "groupware".

And I find these question equally compelling today.

I started a thread in the Meta Discussion Forum at The Wikipedia Review for the sake of discussing these questions in what I hope might be some depth.

I would be extremely gratified if you and any of your team could find time to join the discussion.

Inasmuch as it fairly well samples the same population as Wikipedia itself, the Review can be a chaotic environment at times, but since I'm a moderator on that subforum I think I can keep things slightly less entropic than usual.

Jon Awbrey

Gregory Kohs said...

Ed and Jon,

Just wanted you to know that we will be exploring many issues related to trust and authority on the Internet, at the site:

Akahele.org

We hope that influential thinkers will join the dialog there, either via Comments, or guest articles.

Gregory Kohs
(organizer of the Wikipedia Vandalism Study)

Moulton said...

"As designers of social systems, what should be the governance model for these systems?"

The only model I know of which has a prayer of working is the Community Social Contract Model.

For reasons unbeknownst to me, Wales and his loyal followers have expressly rejected that model in favor of an anachronistic ad hoc tribal ochlocracy.

"For one thing, we still know awfully little about the social dynamics in these large social systems."

The most insightful model I know of for Wikipedian social dynamics is the one crafted by Rene Girard.