During the formation of the HCI field, the need to establish HCI as a science had pushed us to adopt methods from psychology, both because it was convenient as well as the methods fit the needs. Real HCI problems have long moved beyond the evaluation setting of a single user sitting in front of a single desktop computer, yet many of our fundamentally held viewpoints about evaluation continues to be ruled by outdated biases derived from this legacy.
Trends in social computing as well as ubiquitous computing had pushed us to consider research methodologies that are very different from the past. In many cases, we can no longer assume only a single display, only knowledge work, isolated worker, location stationary with short task durations. HCI researchers have slowly broken out of the mold in which we were constrained. Increasingly, evaluations are often done in situations in which there are just too many uncontrolled conditions and variables. Artificially created environments such as in-lab studies are only capable of telling us behaviors in constrained situations. In order to understand how users behave in varied time and place, contexts and other situations, we need to systematically re-evaluate our research methodologies.
The Augmented Social Cognition group have been a proponent of the idea of 'Living Labratory' within PARC. The idea (born out of a series of conversation between myself, Peter Pirolli, Stuart Card, and Mark Stefik) is that in order to bridge the gulf between academic models of science and practical research, we need to conduct research within laboratories that are situated in the real world. Many of these living laboratories are real platforms and services that researchers would build and maintain, and just like Google Labs or beta software, would remain somewhat unreliable and experimental, but yet useful and real. The idea is to engage real users in ecological valid situations, while gathering data and building models of social behavior.
Looking at two different dimensions in which HCI researchers could conduct evaluations, one dimension is whether the system is under the control of the researcher or not. Typically, computing scientists build systems and want them evaluated for effectiveness. The other dimension is whether the study is conducted in the laboratory or in the wild. These two dimensions interact to form four different ways of conducting evaluations:
- Building a system, and studying it in the laboratory. This is the most traditional approach in HCI research and the one that is typically favored by CHI conference paper reviewers. The problem with this approach is that it is (1) extremely time-consuming, and (2) experiments are not always ecologically valid. As mentioned before, it is extremely difficult, if not impossible, to design experiments for many social and mobile applications that are ecologically valid in the laboratory.
- Not building a system (but adopt one), and still study it in the laboratory. For example, this is possible by taking existing systems, such as Microsoft Word and iWorks Pages and comparing the features of these two systems.
- Adopting an existing system, and studying it in the wild. The advantage here is to study real applications that are being used in ecologically valid situations. The disadvantage is that findings are often not comparable, since factors are harder to isolate. On the other hand, the advantages are that real findings can be immediately applied to the live system. Impact of the research is real, since adoption issues are already removed. We have studied Wikipedia usage in detail using this method by releasing WikiDashboard.
- Building a system, releasing it, and studying it in the wild. A well-publicized use of this approach is Google's A/B testing approach . Apparently, according to Marissa Mayer at Google, A/B testing allowed them to finely tune the Search Engine Result Pages (SERPs). For example, how many search results should the page contain was studied carefully by varying the number between a great number of users. Because the subject pool is large, Google can say with some certainty which design is better on their running system. A major disadvantage of this approach is the effort and resource requirement it takes to study such systems. However, for economically interesting applications such as Web search engines, the tight integration between system and usage actually shorten the time to innovate between product versions.
Of these variations, (3) and (4) are what we consider to be 'Living Laboratory' studies. This was the reason why we released WikiDashboard into the wild. We will be releasing a new social search engine called MrTaggy in the near future. The idea is the same: to test some social search systems in the wild to see how they perform with real users.