Monday, June 14, 2010

Model-Driven Research in Social Computing

I'm in Toronto attending the Hypertext 2010 conference, where I gave the keynote talk at the First Workshop on Modeling Social Media yesterday. I want to document a little bit of the points I made in the talk here.

The reason we seek to construct and derive models is to predict and explain what might be happening in social computing systems. For social media, we seek to understand how these systems evolve over time. Constructing these models should also enable us to generate new ideas and systems.

As an example, many have proposed a theory of influentials that identifying a small group of individuals who are connected to the larger social network just in the right way, we can infect or reach the rest of the people in the network. This idea is probably most well-known in the press by the popular book Tipping Point by Gladwell. This model of how information diffuse in social networks is very attractive, not just due to its simplicity, but also the potential of applying this idea in areas such as marketing.

Models such as this are meant to be challenged and debated. They are always strawman proposals. Duncan Watts' simulation on networks have shown that the validity of this theory is somewhat suspect. Indeed, recently, Eric Sun and Cameron Marlow's work, published in ICWSM2009, showed that this theory of influentials might be wrong. They suggest that "diffusion chains are typically started by a substantial number of users. Large clusters emerge when hundreds or even thousands of short diffusion chains merge together."

Most, if not all, models are wrong. Some models are just more wrong than others. But models still serve important roles. They might be divided into several categories:

  1. Descriptive Models describe what is going on within the data. This might help us spot trends, such as the growth of number of contributors, or trending topics in a community.
  2. Explanatory Models help us explain what might be the mechanisms underlying processes in the system. For example, we might be able to explain why certain groups of people contribute more content than another group.
  3. Predictive Models help us engineer systems by predicting what users and groups might want, or how they might act in systems. Here we might build probabilistic models of whether a user will use a particular tag on a particular item in a social tagging system.
  4. Prescriptive Models are set of design rules or a process that helps practitioners generate useful or practical systems. For example, Yahoo's Social Design Patterns Library on Reputation is a very good example of a prescriptive model.
  5. "Generative Models" actually have two meanings depending on who you're talking to. In statistical circles, "generative models" are models that help generate data that look like real user data and are often probabilistic models. Information Theory is a good example of this approach, in fact. Generative Models could also mean that they are models that help us generate ideas, novel techniques and systems. My work with Brynn Evans on building a social search model is an example of this approach.
In the talk, I illustrated how we have modeled the dynamics in the popular social bookmarking system, Delicious, using Information Theory. I also showed how using equations from Evolutionary Dynamics we were better able to explain what might be happening to Wikipedia’s contribution patterns. Talk Title: Model-driven Research for Augmenting Social Cognition


Joe McCarthy said...

Great presentation - thanks for sharing the slides, and some explanatory notes about some of the themes.

Your discussion of "generative models" reminded me of some of the points that Tim O'Reilly has recently been making about "generative platforms" in open government (Gov 2.0) and other areas where participatory platforms can promote real change, a theme that was sparked by Jonathan Zittrain's book, The Future of the Internet ... And How to Stop It.

One of Tim's observations, in particular, harks back to your early point about all models being wrong (or shown to be wrong eventually, as more data and/or new insights become available): "open, generative systems eventually become closed over time, losing their innovative spark in the process".

Given the rapid pace of innovation in social computing, I imagine the most we can hope for are iteratively generative models (and platforms).

Jon Awbrey said...

I've seen so much Wiki-Phrenology over the last 10 years that it keeps flashing me back to the days when that phamous phrenology pic appeared on the cover of just about every other cognitive psych book that came out.

The picture worth a thousand cautionary tales was of course a way of reminding ourselves that pertinent data collection and precise theory both depend on a thorough familiarity with the relevant factors in the domain of interest.

Deja vu all over again …

randall said...

hey ed---interesting stuff. from your slides, i wasn't quite able to suss out your categorization of models was to be applied, however. are you just trying to clarify in what respects a generative model (viz. information theory) is useful for capturing relevant features of delicious? the categories you describe are, of course, far from mutually exclusive; and indeed, one might think that the work that you want generative models to do is better done by models in other categories. (there has, of course, been a lot of ink spilled in the philosophy of science and the philosophy of social science about the proper connections between these various functions that a model might serve---but, as i said at the top, it's not clear to me whether getting clear on that is important for your talk or not.)

all the best.

Ed H. Chi said...

@Joe: I remember reading about generative platforms in that article. Ideas have indeed become more social. Our group has been recently looking into understand information diffusion models, which does seem to take on more generative flavors.

@Jon: Models are a proposal for saying what we might understand about the data underneath. They are most definitely wrong in one way or another. After all, Newton was sort of wrong until Einstein corrected parts of it, and even Einstein's models are not a complete description of the real world. It doesn't make Newton's work or Einstein's work pseudo-science.

@randall: I wasn't so much trying to place everything into neat little piles, as much as offering a way to think about the kinds of models we might want to build. Indeed, in the talk, I mentioned how some models are both explanatory and predictive and generative all at the same time. It does not necessarily mean the models are more useful the more categories they cover, however.