Wednesday, November 5, 2008

'Living Laboratories': Rethinking Ecological Designs and Experimentation in Human-Computer Interaction

During the formation of the HCI field, the need to establish HCI as a science had pushed us to adopt methods from psychology, both because it was convenient as well as the methods fit the needs. Real HCI problems have long moved beyond the evaluation setting of a single user sitting in front of a single desktop computer, yet many of our fundamentally held viewpoints about evaluation continues to be ruled by outdated biases derived from this legacy.

Trends in social computing as well as ubiquitous computing had pushed us to consider research methodologies that are very different from the past. In many cases, we can no longer assume only a single display, only knowledge work, isolated worker, location stationary with short task durations. HCI researchers have slowly broken out of the mold in which we were constrained. Increasingly, evaluations are often done in situations in which there are just too many uncontrolled conditions and variables. Artificially created environments such as in-lab studies are only capable of telling us behaviors in constrained situations. In order to understand how users behave in varied time and place, contexts and other situations, we need to systematically re-evaluate our research methodologies.

The Augmented Social Cognition group have been a proponent of the idea of 'Living Labratory' within PARC. The idea (born out of a series of conversation between myself, Peter Pirolli, Stuart Card, and Mark Stefik) is that in order to bridge the gulf between academic models of science and practical research, we need to conduct research within laboratories that are situated in the real world. Many of these living laboratories are real platforms and services that researchers would build and maintain, and just like Google Labs or beta software, would remain somewhat unreliable and experimental, but yet useful and real. The idea is to engage real users in ecological valid situations, while gathering data and building models of social behavior.

Looking at two different dimensions in which HCI researchers could conduct evaluations, one dimension is whether the system is under the control of the researcher or not. Typically, computing scientists build systems and want them evaluated for effectiveness. The other dimension is whether the study is conducted in the laboratory or in the wild. These two dimensions interact to form four different ways of conducting evaluations:

  1. Building a system, and studying it in the laboratory. This is the most traditional approach in HCI research and the one that is typically favored by CHI conference paper reviewers. The problem with this approach is that it is (1) extremely time-consuming, and (2) experiments are not always ecologically valid. As mentioned before, it is extremely difficult, if not impossible, to design experiments for many social and mobile applications that are ecologically valid in the laboratory.

  2. Not building a system (but adopt one), and still study it in the laboratory. For example, this is possible by taking existing systems, such as Microsoft Word and iWorks Pages and comparing the features of these two systems.

  3. Adopting an existing system, and studying it in the wild. The advantage here is to study real applications that are being used in ecologically valid situations. The disadvantage is that findings are often not comparable, since factors are harder to isolate. On the other hand, the advantages are that real findings can be immediately applied to the live system. Impact of the research is real, since adoption issues are already removed. We have studied Wikipedia usage in detail using this method by releasing WikiDashboard.

  4. Building a system, releasing it, and studying it in the wild. A well-publicized use of this approach is Google's A/B testing approach . Apparently, according to Marissa Mayer at Google, A/B testing allowed them to finely tune the Search Engine Result Pages (SERPs). For example, how many search results should the page contain was studied carefully by varying the number between a great number of users. Because the subject pool is large, Google can say with some certainty which design is better on their running system. A major disadvantage of this approach is the effort and resource requirement it takes to study such systems. However, for economically interesting applications such as Web search engines, the tight integration between system and usage actually shorten the time to innovate between product versions.

Of these variations, (3) and (4) are what we consider to be 'Living Laboratory' studies. This was the reason why we released WikiDashboard into the wild. We will be releasing a new social search engine called MrTaggy in the near future. The idea is the same: to test some social search systems in the wild to see how they perform with real users.


Markus said...

I'm looking forward to the first release - Make sure to announce Mrtaggy on this blog when you are ready to open the project up to the public.

Vivienne said...

but as for (4), I'm just wondering what you are gonna do with user acquisition? assume that you need a user group big enough to observe the "ecological behaviors"

Ed H. Chi said...

Markus, we'll definitely be announcing the first beta release of MrTaggy on the blog, so watch for it here.

Vivienne, user acquisition is definitely hard. Many startups use various forms of marketing to achieve this objective, but we're just a small research group, so we'll have to rely on mostly words-of-mouth. Hopefully we actually build useful and interesting prototypes that users will come back to. In case we can't do acquire enough users, we'll have to adopt existing systems and study them in the wild instead (strategy 3).

Anonymous said...
This comment has been removed by a blog administrator.