Augmented Social Cognition Research Blog from PARC: editors

Showing posts with label editors. Show all posts

Friday, August 7, 2009

PART 2: More details of changing editor resistance in Wikipedia

In the last week, we have received interesting press coverage in New Scientist (as well as Fast Company, Business Insider, and syndicated elsewhere), on the work done in our team on Wikipedia growth rate, and how it has plateaued, changing from an exponential growth model to one that look more linear. Even though this wasn't necessarily new finding, but it was really a teaser for some other observations we have found in the Wikipedia data that is about to be published in WikiSym2009 conference in October.

In the figure below, we see how the slowdown in growth of Wikipedia activity, specifically around different editor classes is different. For each month, we first partition the editors into different classes based on their monthly editing frequency. We then compare the total edit activities among the different editor classes over time.

Monthly edits by user class (in thousands).

[Consistently with the power law, we classified users using an exponential scale: we defined the classes of editors using powers of 10, e.g. 10^0, 10^1, 10^2. This resulted in five classes of users for each month: editors contributing 1 edit (i.e., 10^0), 2 to 9 edits (2-9 class), 10 to 99 (10-99 class), 100 to 999 (100-999 class), and more that 1000 edits (1000+ class).] Note that the classification of the editors was recalculated for each month.

Since the beginning of 2007, the trends of four classes slightly decrease their monthly edits. In contrast, only the highest-frequency class of editors (1000+ edits, dark blue line) shows an increase in their monthly edits.

Another way to look at this data is to analyze the relative amount of activities for each editor class by transforming the data into percentages of the total edits. The figure below complements the information in the figure above by showing the percentage of the volume of edits that each class contributes in relation to the total.

Monthly percentage of edits by each user class.

The two highest frequency classes of editors account for more than half of the total monthly edits (56% from 01/2005 to 08/2008). Furthermore, since 2005 the proportion of contributions by the highest-frequency editor class has increased slightly. In fact, the editors in 1000+ class have kept producing at an increasing rate over the past four years (their average monthly edits per editor for the years 2005 to 2008 were 1740, 1859, 1869, and 2095, respectively).

We now focus on specific evidence about what might have contributed to such slowdown. Revert is the action of deleting a prior edit. The following figure shows the percentage of edits that were reverted (reverted edits) monthly for each editor class. Note that edits related to vandalism and edits performed by robots are excluded.

Monthly ratio of reverted edits by editor class

This illustrates two indicators of a growing resistance from the Wikipedia community to new content.

First, the figure shows that the total percentage of edits reverted increased steadily over the years. The total percentage of monthly reverted edits (see dashed black line) has steadily increased over the years for the all classes of editors (e.g. 2.9, 4.2, 4.9, and 5.8 percent of all edits for 2005 through 2008 as shown by the dash line).

Second, more interestingly, low-frequency or occasional editors experience a visibly greater resistance compared to high-frequency editors [see the top two reddish lines, as compared to other lines]. The disparity of treatment of new edits from editors of different classes has been widening steadily over the years at the expense of low-frequency editors.

We consider this as evidence of growing resistance from the Wikipedia community to new content, especially when the edits come from occasional editors.

Monday, January 26, 2009

Governing and authorship models at Wikipedia and Britannica

Elsewhere, we have spoke about the complex and interesting governing and authorship model in Wikipedia. How counter-intuitive is it that a model like "anyone can edit anything they want" could produce a useful information resource?!

We have conducted some characterizations of the social dynamics within this community, and tracked its changes over time. Interestingly, in the last few days, both Wikipedia and Britannica have been in the news for debates on their stance of the authorship and editorial model.

First, on Jan 24th, we learned from the BBC that the president of Britannica wrote a blog entry in which he outlined a new plan at Britannica to enable readers as well as more experts and editors to help expand and maintain the articles. While not naming Wikipedia by name, it was a clear nod toward a more collaborative relationship Britannica will have with its readers. Specifically, in the blog entry, Jorge Cauz says that "We believe that the creation and documentation of knowledge is a collaborative process but not a democratic one." Most would agree that, in the past, the collaborative process that Britannica had was much more restrictive, and now they seem to have decided to open the door wider to include more people in the editorial process.

Then, today, we learned, also from BBC, that Jimmy Wales have caused a huge stir at Wikipedia for suggested a more restrictive approach to the editing process. He now believes that Wikipedia should follow a model in which edits from anonymous users have to be vetted by one of the site's editors before becoming live.

Apparently, the heated debate is now spreading, and is being mentioned as a big news item on the Yahoo! front page after being written up by AFP. So here we have a system that has been extremely liberal with its editorial policy moving toward a more restrictive authorship model.

So what gives? Is there a right way or wrong way to constructing and compiling knowledge resources? As designers of social systems, what should be the governance model for these systems?

For one thing, we still know awfully little about the social dynamics in these large social systems. We have been quoted in the past that our characterization models of editors show that the top 1% of the editors in Wikipedia generates 50% of the edits. While that is true, the other 50% is being generated by the other 99% of the editors. This other 50% is just as important as the first 50%!

We have been recently conducting some additional research to understand class structures in Wikipedia. We already know that the distribution of editors and their frequency of edits in Wikipedia is a classic power law curve. In order to understand editors through out this distribution, we first ranked editors by their edit frequency, and then divided all of the edits into four quarters, according to this sort.

For one month worth of edit data, there are about 220 editors that are at the very top of the pyramid. These top (most frequent) editors produce the first quarter (25%) of the edits. The next 25% of the edits come from about 1000 editors. While the 3rd quarter of edits come from about 4000 editors, and the last quarter comes from about 15000 editors.

So now the research question is whether you want to design your editing policy to favor the upper class (top editors and administrators), the middle class (the 5000-6000 editors who contribute the middle 50% of all edits), or the lower class (the 15000 editors who contribute the last 25%).

One way to think about this problem is to study the amount of resistance each of these four classes of editors experience on Wikipedia. A metric that we used is the reverts-to-edits ratio. That is, on average, what percentage of edits were reverted, as experienced by each of these four classes of editors? Turns out that the reverts-to-edits ratio for each of these 4 classes of editors were 1.3%, 1.4%, 1.5%, and 4.7%, respectively. Meaning that the lower class of editors clearly experience greater resistance, such that, on average, 1 out of every 20 edits they contribute are reverted. Moreover, the resistance they have experienced have generally increased over time (from about 3% in early 2006 to 5-6% in 2007-2008, and back down to around 5% in late 2008).

So, even without the "flagged revision" mechanism such as the ones suggested by Jimmy Wales, it has already been getting harder for the lowest class of occasional editors to produce edits that remain as contribution in Wikipedia.

The AFP article points to the fact that the debate over the policy came about because of vandalism on Ted Kennedy's page, which had falsely suggested he died after suffering a collapse at a lunchon during Obama's inauguration. But apparently this was corrected within minutes, suggesting that the current system is still correcting most mistakes quite rapidly. Moreover, after I did some sleuthing in the editing history, it appears that the original vandalism edit was done by a registered user named "Gfdjklsdgiojksdkf", and not an anonymous user.

So, it is unclear to me that the current system is not working. Are we fixing something that isn't broken (at least not yet)?

Tuesday, May 22, 2007

Controversy Visualization

Alas, not our visualization of revert relationships, but someone got slashdotted for doing a visualization of power struggle in Wikipedia:
Slashdot article.

"todd450 pointed us to a nifty visualization of Wikipedia and controversial articles in it. The image started with a network of 650,000 articles color coded to indicate activity. The original image is apparently 5' square, but the sample image they have is still pretty neat."

The original blog post was here.