Tuesday, May 15, 2007

Long Tail of user participation in Wikipedia

(Ed H. Chi; joing work with Niki Kittur, Bryan Pendleton, and Bongwon Suh)

As we were getting ready for the alt.CHI presentation last week at the CHI conference, I realized that the way we have been looking at the frequency of user edits in Wikipedia was not really getting at the root of the issue. What we really aspire to find out is "what processes are governing the users' participation in Wikipedia?"



In the alt.CHI paper, we discovered that around 2003-2004, administrators in Wikipedia was making around 50% of edits! Definitely seemed like "power of the few" was at work in Wikipedia. Indeed, admins in Wikipedia have a great deal of power. They set policies, ban destructive users, help resolve disputes, and generally keep order within the system.



Moreover, when we analyzed the data using high-edit users (users with 10,000 edits or more), we got the same result. The algorithm was: (1) For all wikipedia edits for all times, find users with more than 10k edits; (2) compute the total number of edits in month "x"; (3) compute the total number of edits made by users identified in step 1; (4) divide result from step 3 with the number from step 2. Here is the graph:



And when we computed the diff between all 58.5 million revisions of Wikipedia, we found that the number of words changed by admins (as a proportion of total words changed by everyone) was also waxing and waning from 10% to about 50% back down to near 10%.



We discovered, as outlined in the alt.CHI paper, that users with low number of edits is becoming a bigger part of the total population. It seemed like from the above analysis, users with low number of edits were becoming more powerful in Wikipedia.

When I presented these results to the Computing Science Laboratory here at PARC late last year, David Goldberg suggested to me "why don't you do the other analysis? Compute how much work the top 1% of the user (at any given moment in time) was doing?" The difference between this analysis and the analysis we did was somewhat subtle. The analysis we did was equivalent to understand the work of the top 1% users for the entire existence of Wikipedia, instead of top 1% for that month. The algorithm here would be: (1) First, for a given month, rank all users according to the number of edits they made; (2) From the ranking of users for that month, take the top 1% of those users; (3) For that month, compute the total number of edits made; (4) For that month, compute the total number of edits made by users found in Step 2; (5) Divide result from step 3 with step 4. Here is the result from that analysis:


This clearly showed a very different picture. So what's really going on? It was this past week I realized that we could have summarized the result in a different way. We could instead plot the long tail distribution of user contributions:


In fact, plotted on a log-log plot (also known as a power law plot), here is what it looks like:


This arises partially because of the user turnover rate on Wikipedia:


So what this appears to mean is that there is a rather simple explanation for what's going on here. We have a long tail architecture of participation in Wikipedia. At any given moment in time, a few users are a lot more active than the rest of the population, but there is a long tail of other users who are contributing to the effort.

2 comments:

Ed H. Chi said...

Here it is explained another way by Niki: a small proportion of editors (e.g., the top 5%) appears to be doing most of the work -- though remember that the absolute number of editors that makes up that top 5% is growing as
well.

Thus it does appear that the admins -- a privileged group that doesn't grow
quickly in numbers -- are declining in influence in Wikipedia. However, the
reason why we call it the "rise of the bourgeoisie" instead of
democratization is that, as you point out below, much of the work is still
being done by a small proportion of elite (though often non-privileged)
users.

Anonymous said...

This is a captivating study, but I wonder if this trend has continued? Has the bourgeoisie as you call them continued to "rise" or has this trend leveled off?

Secondly, is it worth considering that you only modeled those that contributed? That is so say that there is a substantial user base of people who never contribute anything (i.e. the "lurkers") but still should be considered "part of the crowd" since they are users of the system in question.