Friday, August 7, 2009

PART 2: More details of changing editor resistance in Wikipedia

In the last week, we have received interesting press coverage in New Scientist (as well as Fast Company, Business Insider, and syndicated elsewhere), on the work done in our team on Wikipedia growth rate, and how it has plateaued, changing from an exponential growth model to one that look more linear. Even though this wasn't necessarily new finding, but it was really a teaser for some other observations we have found in the Wikipedia data that is about to be published in WikiSym2009 conference in October.

In the figure below, we see how the slowdown in growth of Wikipedia activity, specifically around different editor classes is different. For each month, we first partition the editors into different classes based on their monthly editing frequency. We then compare the total edit activities among the different editor classes over time.

Monthly edits by user class (in thousands).

[Consistently with the power law, we classified users using an exponential scale: we defined the classes of editors using powers of 10, e.g. 10^0, 10^1, 10^2. This resulted in five classes of users for each month: editors contributing 1 edit (i.e., 10^0), 2 to 9 edits (2-9 class), 10 to 99 (10-99 class), 100 to 999 (100-999 class), and more that 1000 edits (1000+ class).] Note that the classification of the editors was recalculated for each month.

Since the beginning of 2007, the trends of four classes slightly decrease their monthly edits. In contrast, only the highest-frequency class of editors (1000+ edits, dark blue line) shows an increase in their monthly edits.

Another way to look at this data is to analyze the relative amount of activities for each editor class by transforming the data into percentages of the total edits. The figure below complements the information in the figure above by showing the percentage of the volume of edits that each class contributes in relation to the total.

Monthly percentage of edits by each user class.

The two highest frequency classes of editors account for more than half of the total monthly edits (56% from 01/2005 to 08/2008). Furthermore, since 2005 the proportion of contributions by the highest-frequency editor class has increased slightly. In fact, the editors in 1000+ class have kept producing at an increasing rate over the past four years (their average monthly edits per editor for the years 2005 to 2008 were 1740, 1859, 1869, and 2095, respectively).

We now focus on specific evidence about what might have contributed to such slowdown. Revert is the action of deleting a prior edit. The following figure shows the percentage of edits that were reverted (reverted edits) monthly for each editor class. Note that edits related to vandalism and edits performed by robots are excluded.

Monthly ratio of reverted edits by editor class

This illustrates two indicators of a growing resistance from the Wikipedia community to new content.

First, the figure shows that the total percentage of edits reverted increased steadily over the years. The total percentage of monthly reverted edits (see dashed black line) has steadily increased over the years for the all classes of editors (e.g. 2.9, 4.2, 4.9, and 5.8 percent of all edits for 2005 through 2008 as shown by the dash line).

Second, more interestingly, low-frequency or occasional editors experience a visibly greater resistance compared to high-frequency editors [see the top two reddish lines, as compared to other lines]. The disparity of treatment of new edits from editors of different classes has been widening steadily over the years at the expense of low-frequency editors.

We consider this as evidence of growing resistance from the Wikipedia community to new content, especially when the edits come from occasional editors.


Anonymous said...

I'm interested in how much of the user population makes the edits. In other words what proportion of users is 1000+ and what proportion is 100+. In addition what are the absolute numbers of people in the categories.

Anonymous said...

For an example of what I'm thinking about.

I transformed your graphical data using a few guesses and came up with less than 1% of the editing population makes over half of the edits.

George said...

I agree that there is increased resistance, but I'm not sure that this is evidence of it. Editors who make a very small number of edits are in many cases interested in changing articles in order to shape them towards a particular opinion. It's not their newness, but their unfamiliarity with the neutrality that the project aspires to.

Mike Peel said...

Thanks for sharing these statistics; they make fascinating reading. Some questions:

With your first figure, you say that your classification is recalculated each month. Is there a significant number of editors changing between your bands?

"edits related to vandalism" - how did you determine this? Is it plausible that not all edits reverting vandalism were identified (e.g. if an inappropriate edit summary was used)?

The Wikipedia Signpost mentioned on Twitter "Maybe it's time to do a small study by hand to figure out what sorts of edits by occasional editors are being reverted." Have you done such a study at all, or have you done any analysis of the edits that are being reverted to see why they are being reverted?

Are you looking at all namespaces combined, or just articles?

There are a lot of steps in the plot of percentage of edits reverted, which I wouldn't expect to be random fluctuations given the amount of data you should have. Do you have an explanation for these steps?

Anonymous said...

"edits related to vandalism […] are excluded"

I'm curious as to how you are accomplishing this.

I'm also interested in knowing why there is a seasonal periodicity in the proportional in these graphs— as previously reversion level periodicity has been understood to be the result of increased vandalism following school schedules.

Jon Awbrey said...

Call me WikiPollyannish, but I still have hopes that some day in the not too distant future the intrepid ASCologists who spy on Wiki Islanders from the safety of their armchair observatories will develop the capacity to critically reflect on the Just So Stories of the Wikislanders — perhaps enough to quit assuming Wikipediot Mythology as the basis of their own research.

Henk Poley said...

"Note that edits related to vandalism and edits performed by robots are excluded."

How did you decide what was a revert due to vandalism? I know robots are supposed to us a 'bot' flag when posting. Did you like for the 'evil bit' on the vandalism edits ? ;-)

Anonymous said...

Thanks for this deep insight into what is going on with Wikipedia!

What you reason to be a growing resistance of the wikipedia community against new content might just be an indication that quality expectations as well as rules and regulations set by the Wikipedia led to this.

Especially the strong editors (your 1000+ group) apparently apply these high expectations and most of the rules to new content in order to keep quality on the rise.

But on the other hand and from my own experience by testing out the reactions of the community, the strict enforcement of quality rules often led to reverts inspite one had added valuable content - just not in the right way.

Nowadays it is more important to the community to striclty apply rules than fixing issues in order to preserve valuable new content.
Provokingly one could put it this way: The community and by this i mean your 1000+ editors just got too lazy.

GerardM said...

While interesting, your research would be particularly welcome on the "other" Wikipedias. Because of the focus on the English language Wikipedia, there is not much known about the other half of the Wikipedia traffic.

The relation between localisation and community/content growth, the relevance of Wikipedias that are the biggest Internet corpus of a language, how Wikipedias mature and implement a BLP, NPOV ...

Jon Awbrey said...

Bobbie Johnson of The Guardian asks, "So what exactly is the scarce resource that's changing the face of Wikipedia?"

It always bugs me when someone picks the words from my own brain and publishes them first, but there you have it.

Since a moment's reflection should tell us that it can't be any of the usual limits to growth, I think this is a very good question.

Let me suggest that you consider immune system models.

In my view, Wikipedia acts very much like a POV Immune System for the particular segment of the population to which it panders, and I think that what we are seeing here is a limit on the elasticity of that immune system.

Pete Forsyth said...

I am not entirely convinced by the conclusion of this post:

"We consider this as evidence of growing resistance from the Wikipedia community to new content, especially when the edits come from occasional editors."

First, there's a technical error: a reversion is not the same thing as removing new content. A number of reversions restore content that was removed, or do a combination of restoring and removing. To make a claim like this, you would need a more nuanced methodology that distinguishes between reversions that restore content, reversions that remove content, and stuff that is unclassifiable on that basis.

Second -- and this really is not a criticism of your report, but of much of the discussion that has flowed from it in the media. It is not necessarily a bad thing if there is growing resistance, as you suggest.

Content added by new contributors is, by and large, less informed by a strong understanding of Wikipedia content and guidelines. So, your conclusion could just as easily be used to support the notion that Wikipedia content is getting better/more complete, or that the community is getting more effective at reverting edits that are inappropriate.

I'm not sure that's the appropriate conclusion, but my point is that while your study forms a great starting point or foundation for this sort of discussion, many observers have leapt to conclusions rather than thinking about good follow-up questions.

Anonymous said...

And, one other observation: your study seems to focus exclusively on the number of edits. This figure is very easy to measure, and there are good tools for analyzing edits on this basis.

But that doesn't mean it's the best measure of what's happening on Wikipedia.

For instance, if for whatever reason, there was a trend where many infrequent contributors were often making more substantial/lengthy contributions per "save" click, and those contributions were tending to stick while the less substantial edits were getting reverted, it doesn't sound like your methodology would account for that.

Again -- not sure what the truth is here, but there is certainly a wide variety of approaches to how much one does in a given edit. Ignoring that factor seems like a significant gap in this kind of research. (Again, I'm not sure how you'd best capture that, but that difficulty does not mean the issue is insignificant.)

nb said...

It is time for Wikipedia to grow to the next level: Get DISTRIBUTED!

Anyone should be able to setup a "Minipedia" with data from Wikipedia, edit and publish their own version. If it is good, Wikipedia can "pull" from those modified versions. This does not need to be a complete version.

A good model is how distributed source management systems work like Git or Mercurial and how they are used, have a look at KDE or Linux kernel or

I think this is at least worth testing.

whitestucco said...

i think that, in the absence of more detailed information, all you can say is that what you are seeing is not a resistance to new content but a resistance to new editors. i think it says less about the content of wikipedia than it does about how the community of editors is becoming more exclusionary. whether this is because the rules (standards, styles, mores, etc.) are solidifying, or because wikipedia editors like editors in the publishing business are justifying their existence by editing, or because of other things or a combination of things, requires a detailed look at what has actually been done. in any case, though wikipedia represents (or represented) a new approach to reference work creation, it would be naive to assume that the human beings who make up the self-described community that created (and revise and extend) wikipedia represent a different order of persons than the humans, sometimes self-described as reference depts. of publishers, who have previously created and revised more apparently traditional reference works.

GZ said...

I just did a few random edits as an anon IP. A day later, about 1/3 of them were reverted, despite being accurate. 2 or 3 years ago this wasn't the case. Hypothesis confirmed.

Siberiano said...

I've been a low-frequency editor on Wikipedia. My wiki enthusiasm ceased in 2007 after seeing a fake language project accepted by their board of editors (iditors?). (I don't call project's name to not give them publicity).

First, the project authors have pressed on editors' soft spots like telling about some marginal, almost extinct culture, yet a lot of votes on wiki, and like they're in opposition to oppressive authorities.

Nobody in Wikipedia's Board of Iditors gave a shite to call any university or linguistic institute in the area of the "language" in question to ask a COMPETENT person's opinion.

VOTING, the standard procedure in wikipedia to decide on anything, is the worst way to fight the stupidness of crowds.

For the same reason, articles on my region's places contained common misconceptions and speculations, while lacked facts.

After that language project I have seen the true picture of how it works.

1) To have authority on wiki you have to get no life, but spend all your time editing articles.

2) As soon as you have some authority, you can post whatever you want and let your incompetent opinion be the rule.

3) Voting is taken at it's worst, namely to decide things that a competent person must do.

4) Nobody gives a shite to check things in real world. Google is the ultimate measure. (Guess how easy to manupulate it is. If you create enough seemingly independent sources linking to you, it gives you enogh weight in the eyes of nerdy administrators).

Bottomline: wikipedia is a internet-nerds-driven project, hence the procedures and disconnection from the reality.

gritzko said...

A related story on deletionism in practice:

Anonymous said...

There are a lot of interesting things that can be derived from this.

For example: How much time do editors spend on what they do. At a thousand edits a months and 5 minutes per edit I get close to 3 hours a day, ever day. If that's right it means that the editor is not doing other work in that time. Alternately I've seen fora where contributors get some sort of reward (kudos/mojo/gold-stars whatever it's called) formaking an entry, any entry. The result is vast numbers of meaningless posts from some. If some editors are trying to score points like this then their work can hardly be of any use at all.

Then there is the issue of breadth of topics covered. From an analysis of what the editor does we can establish a lot. (I can't imagine that this can be automated!) Is it grammar and syntax, is it factual content, is it wikilawyering about a neutral viewpoint or references. If it's factual content how good are they in the areas that they choose to contribute?

Then there is the issue of anonymity. I'd love to see some analysis of the differences between those who give their name and those who use an alias.

Anonymous said...

@detail, you are wrong about voting. Consensus, not voting, is the standard decision making process on Wikipedia. I think they are also a lot more prone to disagreement among "the regulars" than you imply. "Opinions" that another editor thinks are POV will not last long without citations to back them up.

Paper on Research said...
This comment has been removed by a blog administrator.
Starry*Gordon said...

As Wikipedia becomes more valuable, those who "own" it will be more strongly motivated to defend their "possession", both individually and as a group. I would also expect to see subgroups develop who will compete for power. At a certain point the project will metamorphose into an institution modeled on academic or religious institutions. It will then be largely static, and contributions and modifications by the profane masses neither desired nor easy to accomplish. Sic transit.

logo design - logoinn said...

how did u determine all this, Any tools are u using?

Ed H. Chi said...

We built all of the analytics software from scratch, which ran over the data provided by Wikipedia dumps. If you're interested, the details are all in the WikiSym paper in part 4 of the blog post.

Anonymous said...


Wikipedia is an ant nest, a very loose confederation or even a heap of unconnected authors. Taking aggregate statistics indicates trends in what people are doing, but doesn't negate that.

There is already some persuasive evidence that some Wikipedia entries are curated by paid for authors (they try to hide that of course), outright edit wars are not uncommon and decisions (open to hostile interpretation) to only allow some people to edit certain entries happen.

So it's already biased. As anybody who has made their first wrong decision based on Wikipedia wrongness will be all too aware.

So, I believe, some have been "feeling proprietorial" about their entries for years. Others don't. Will "ownership feelings" increase? I imagine so as the less principled parts of society get more involved.

For an individual starting an entry, I reckon, it's a good idea to shed any vestige of ownership the moment you publish.