Showing posts with label model. Show all posts
Showing posts with label model. Show all posts

Wednesday, July 22, 2009

PART 1: The slowing growth of Wikipedia: some data, models, and explanations

In September of 2008, we blogged about a curious change in Wikipedia that we didn't know how to explain that we had known for a while, and the ASC group has been looking into understanding this change in the last 6-9 months or so. The change that we were curious about was that the growth rates of Wikipedia have slowed. We were not the only ones wondering about this change. The Economist (archived here), for example, wrote about it.

We are about to publish a paper in WikiSym 2009 on this topic, and I thought we should start to blog about what we found.


Monthly edits and identified revert activity

The conventional wisdom about many Web-related growth processes is that they're fundamentally exponential in nature. That is, if you want some fixed amount of time, the content size and number of participants will double. Indeed, prior research on Wikipedia has characterized the growth in content and editors as being fundamentally exponential in nature. Some have claimed that Wikipedia article growth is exponential because there is an exponential growth in the number of editors contributing to Wikipedia [1]. Current research show that Wikipedia growth rate has slowed, and has in fact plateaued (See figure at right). Since about March of 2007, the growth pattern is clearly not exponential. What has changed, and how should we modify our thinking about how Wikipedia works? Prior research had assumed Wikipedia works on a "edit begets edit" model (That is, a preferential attachment model where the more an article gets edits, the more likely it would receive more edits, and thus resulting in exponential growth [2].) Such a model does not preclude some ultimate limitation to growth, although at the time it was presented [2] there was an apparent trend of unconstrained article growth.


Monthly active editor - number of users who have edited at least once in that month


The number of active editors show exactly the same pattern. The 2nd figure on the right shows how since its peak in March 2007 (820,532), the number of monthly active editors in Wikipedia has been fluctuating between 650,000 and 810,000. This finding suggests that the conclusion in [1][2] may not be valid anymore. We have a different process going on in Wikipedia now.


Article growth per month in Wikipedia. Smoothed curves are growth rate predicted by logistic growth bounded at a maximum of 3, 3.5, and 4 million articles.

Some Wikipedians have modeled the recent data, and believe that a logistic model is a much better way to think about content growth. Figure here shows that article growth reached a peak in 2007-2008 and has been on the decline since then. This result is consistent with a growth processes that hits a constraint – for instance, due to resource limitations in systems. For example, microbes grown in culture will eventually stop duplicating when nutrients run out. Rather than exponential growth, such systems display logistic growth.

We will continue to blog about what we believe might be happening in the next few weeks, as we find time to summarize the results.

[1] Almeida, R.B.m, Mozafari, B., and Cho, J., On the evolution of Wikipedia. ICWSM 2007, Boulder, Co., 2007.
[2] Spinellis, D., and Panagiotis, L. The collaborative organizations of knowledge. Communications of the ACM, 51(8), 68-73, 2008.

Monday, April 20, 2009

Game theory and Cooperation in Social Systems

It's almost 2am, but I have been thinking about a summary of a recent Nature paper I read while I was in Boston visiting MIT. I had picked up the article in MIT Tech Talk on a whim during a visit to the Stata Center where MIT's CSAIL laboratory is located.

This article helped me start thinking about the conundrum of:
- why there are so many people willing to spent so much time shuffling and passing links to other people?
- why people write Wikipedia articles when they can spend time doing other things?
- why do users tag photos and URLs when the majority of the benefit is for others to find these items more easily?
In short, why is it that entities in social systems cooperate, especially when the benefit to oneself is not entirely clear at all?

Turns out researchers of microbes have been thinking about some of these cooperation problems as well. "One of the perplexing questions raised by evolutionary theory is how cooperative behavior, which benefits other members of a species at a cost to the individual, came to exist." They have used yeast as a model for understanding what might be happening. Sucrose is not yeast's preferred food source, but some yeast cells will metabolize it when glucose is not available, but the sugar diffuse away, and other free-rider yeast cells (lazy bums!) then benefits from the sugar for free.

Well, if the sugar diffuse away completely, then there is no reason to be the 'cooperating' cell to spent all that energy to benefit others. It gets really interesting when the cooperating yeast cell have preferential access to, say, 1 percent of the sucrose they metabolize. This slight advantage not only allow for the cooperating cells to compete effectively against the cheaters, but also enable the entire yeast community to benefit from having sucrose as an alternative food source. Moreover, no matter what the starting numbers of yeast cells, they end up into an equilibrium state with just the right amount of cooperating cells and cheaters present after some evolutionary period. The MIT team used game theory to model this entire process, and showed why it works the way it does. Darn cool!

This got me thinking about agents in a social system sometimes behave in similar ways, and can be modeled using game theory. I'm sure some of this has already been done. This sort of study is common in behavioral economics, for example. But how does it apply direct in social web system modeling? How can it help explain, for example, the tagging behavior of users in flickr? Perhaps the little bit of benefit that the user gains from organizing photos that she owns or have found is enough to turn them into 'cooperating' agents, from whom other freeriders obtain benefit. Moreover, the idea could be used to model why there are just the right pareto-balance (and power-law distributed) of cooperating agents and freeriders in a social web system.

Reference:

Jeff Gore, Hyun Youk & Alexander van Oudenaarden.
Snowdrift game dynamics and facultative cheating in yeast.
Nature advance online publication 6 April 2009 | doi:10.1038/nature07921