Augmented Social Cognition Research Blog from PARC: social web

Monday, March 8, 2010

Wikipedia's People-Ware Problem

Last week, we hosted a visit from the Wikimedia Foundation on issues relating to our work on community analytics, and what it tells us about Wikipedia's problems and possible solutions. Naoko Komura (pictured at right) of the Wikimedia Usability Initiative, as well as Eric Zachte, the staff data analyst (also pictured at right), spoke very eloquently about how we can create social tools to direct the best social attentions to the needed parts of Wikipedia.

Fundamentally, Wikipedia has always had a "people-ware" problem: the distribution of the expertise that is freely donated to the right places. It has been and always will remain its greatest challenge. The amazing thing about Wikipedia is that it managed to do this for so long, such that a valuable knowledge repository can be built up as a result. At first, people simply came because it was the place to be. Now, we have to work a little harder.

We spent a lot of time talking about the best way to model this people-ware problem, either using biological metaphors (evolutionary systems with various forces), or economic models (see last post here). However, one thing to be aware of is the danger of "analysis paralysis", where you spend so much time analyzing the problem, and forget that there are already many ideas that have been generated for moving the great experiment forward.

For example, there are many places in Wikipedia that are not well populated. It's well-known that many scientific and math concept articles, for example, could use an expert-eye to catch the errors and explain the concepts better. How can we build an expertise finder that would actually invite people to fix problems that we know exists in Wikipedia?

Another idea might be to have the whole system be more social. Chris Grams blogs about a part of this idea here. We suggested some time ago to have a system like WikiDashboard, where you actually show the readers what the social dynamics have been for a particular article.

Wikipedia was created in 2001, when social web was still in its infancy. During the ensuing 9 years, it has changed very little, and I would argue Wikipedia have not kept up with the times. Lots of "Social Web" systems and new cultural norms have been built up already. For example, I suspect that many of us would not mind at all to reveal our identities on Wikipedia, and we might like to login with our OpenIDs and even have verified email addresses so that the system can send me verification/clarification/notification messages. The system perhaps should connect with Facebook, so that my activities (editing an article on "Windburn") is automatically sent to my stream there. My friends, upon seeing that I have been editing that article, might even join in.

I think that Wikipedia is about to change, and it is going to become a much more socially-aware place. I certainly hope that they will tackle the People-Ware (instead of the Tool-Ware) problems, and we will see it become an exciting place again.

Sunday, January 4, 2009

Cloud Computing, Science2.0, and the Social Web

Start off 2009 with a more philosophical entry...

I was recently in Asia to give the keynote talk at the International Conference on Asia-Pacific Digital Libraries (in Bali, Indonesia!) In my recent travels and talks, I have been asked about the relationship between the latest buzz on "Cloud Computing" and Web2.0 (with its already-evident connections to service-based computing, social web, and social science).

Cloud computing trend might be best motivated by the understanding that data management and computational processing is moving away from personal computing frameworks into a collaborative workspace that is managed in the network. The impact is wide and deep. It's intertwined with service-based computing, Web2.0, and other trends.

The main value proposition is further "abstraction" that reduces management costs. For example, backup storage is abstracted into the cloud, so you don't have to worry about your hard disk failing. Computation is abstracted into the cloud, so you don't have to worry about not having enough computational nodes for your data analysis job. It is an inevitable trend in computing, because of the need to reduce complexity and data-management/computation-management costs. It's clear that, in the near future, the backup storage and computation will continue to evolve into collaborative workspaces that you never have to administer, nor would you have to worry about backing up your work.

Cloud computing has been touted as the second coming of computing science. That all science endeavors will now rely on cloud computation capabilities. Jim Gray, (the missing sailor, Turing Award winner, and the database guru), once said that the fourth paradigm of scientific discovery will involve "data-intensive explorations which unify theory, simulation, and experiment". I was asked what I thought of this new direction. Jim Gray is (was) a big figure in computing, so his opinion is certain worth its weight in gold. It's certainly one approach that would enable us to tackle bigger and more complex problems.

Jim Gray's fourth paradigm is rooted in his belief that data is at the heart of science -- essentially a kind of fundamental 'empiricism'. This kind of empiricism certainly has been at the heart of social experiments in Web2.0 applications. This viewpoint was argued by Shneiderman in the recent Science journal as being a kind of 'Science2.0'. The label '2.0' certainly has some relation to Web2.0 and cloud computing in that the same computational techniques being invented to handle social analytics and cloud computing are needed to do this new kind of empirical science.

The big bet is that big data sets will enable bigger science to be done (if you believe that all science derives fundamentally from observations.) I do worry that this viewpoint places too much faith is placed in blackbox science (i.e. input large data set into database, apply MapReduce or other parallelized machine-learning techniques, and then wham! Patterns emerge!) This seems to place too much faith on machine learning to do much of the heavy lifting. True scientific model building isn't just finding some parameters on some statistical algorithm. Science has more creativity than that.

From a practical perspective, the need for models and patterns for design is pressing, we certainly can't just rely on rationalism to generate all of the understanding needed to push forward. So Jim Gray's paradigm and other versions of Science2.0 are certainly part of the answer to really advance scientific understanding. Big-data-science has certainly been a huge propeller of advanced web analytics, enabling Google/Yahoo/Microsoft to be the big winners in computing. So investing in big-data-science is a 'no-brainer' in my book, but one needs to combine it with truly creative scientific work.