Monday, June 14, 2010

Model-Driven Research in Social Computing

I'm in Toronto attending the Hypertext 2010 conference, where I gave the keynote talk at the First Workshop on Modeling Social Media yesterday. I want to document a little bit of the points I made in the talk here.

The reason we seek to construct and derive models is to predict and explain what might be happening in social computing systems. For social media, we seek to understand how these systems evolve over time. Constructing these models should also enable us to generate new ideas and systems.

As an example, many have proposed a theory of influentials that identifying a small group of individuals who are connected to the larger social network just in the right way, we can infect or reach the rest of the people in the network. This idea is probably most well-known in the press by the popular book Tipping Point by Gladwell. This model of how information diffuse in social networks is very attractive, not just due to its simplicity, but also the potential of applying this idea in areas such as marketing.

Models such as this are meant to be challenged and debated. They are always strawman proposals. Duncan Watts' simulation on networks have shown that the validity of this theory is somewhat suspect. Indeed, recently, Eric Sun and Cameron Marlow's work, published in ICWSM2009, showed that this theory of influentials might be wrong. They suggest that "diffusion chains are typically started by a substantial number of users. Large clusters emerge when hundreds or even thousands of short diffusion chains merge together."

Most, if not all, models are wrong. Some models are just more wrong than others. But models still serve important roles. They might be divided into several categories:

  1. Descriptive Models describe what is going on within the data. This might help us spot trends, such as the growth of number of contributors, or trending topics in a community.
  2. Explanatory Models help us explain what might be the mechanisms underlying processes in the system. For example, we might be able to explain why certain groups of people contribute more content than another group.
  3. Predictive Models help us engineer systems by predicting what users and groups might want, or how they might act in systems. Here we might build probabilistic models of whether a user will use a particular tag on a particular item in a social tagging system.
  4. Prescriptive Models are set of design rules or a process that helps practitioners generate useful or practical systems. For example, Yahoo's Social Design Patterns Library on Reputation is a very good example of a prescriptive model.
  5. "Generative Models" actually have two meanings depending on who you're talking to. In statistical circles, "generative models" are models that help generate data that look like real user data and are often probabilistic models. Information Theory is a good example of this approach, in fact. Generative Models could also mean that they are models that help us generate ideas, novel techniques and systems. My work with Brynn Evans on building a social search model is an example of this approach.
In the talk, I illustrated how we have modeled the dynamics in the popular social bookmarking system, Delicious, using Information Theory. I also showed how using equations from Evolutionary Dynamics we were better able to explain what might be happening to Wikipedia’s contribution patterns. Talk Title: Model-driven Research for Augmenting Social Cognition