Friday, September 3, 2010

Revamping WikiDashboard

I released WikiDashboard almost three years ago. Believe it or not, the server for WikiDashboard has been running under my desk for three full years (the photo shows the actual server). It was launched in a rush to meet a deadline for an academic paper that we published at a conference (ACM SIGCHI 2008) and limited maintenance has been done so far.

The old Power Mac (http://en.wikipedia.org/wiki/Power_Mac_G5 ) has been pretty reliable but it is becoming increasingly untrustworthy lately. Frustrated with frequent crashes, hangs, and sluggishness, I finally decided to do something. As I’m migrating the tool out of the old machine, I’ve added a few new features. I hope you find it useful.


Faster and scalable infrastructure
The server is now running on Google App Engine. WikiDashboard is hosted as a web app on the same systems that power Google applications. WikDashboard should provide faster, reliable, and scalable service to you. I plan to keep the old server running for a bit but it will eventually forward the traffic to the new server.

Support ten more languages
Thank you to everyone who showed interest in having WikiDashboard in your own language version!

Bongwon Suh
http://www.parc.com/suh
@billsuh http://twitter.com/billsuh

Thursday, September 2, 2010

Open data manipulation and visualization: Challenges

I typically blog about research results here, but here is one post that's more conversational, and discussion oriented.  My good friend m.c. shraefel asked me a question via email: "What are 1 or 2 key priorities you think must be addressed that will aid citizen focused manipulation of open data sources for personal/social knowledge building?"


Here is my answer to her:

The issues you raised was precisely the inspiration for my Ph.D. Thesis work on creating a visualization spreadsheet.  From over 10 years ago, the idea was that if people can easily use spreadsheets, then they ought to be able to take that model further and start creating visualizations using them, and the thesis was an exploration to find out how to design such systems.  I think of ManyEyes, and Jeff Heer's later works to be in the same direction.

We have since learned a lot about user contributed content on systems like Wikipedia, Delicious, Twitter, and they show a very interesting participation architecture that consists of readers, contributors, and leaders.  Not all users want to be leaders, and not all users want to contribute.  We have sometimes use the derogatory term of "lurkers" to describe "readers", which I think is a bit unfair.  Ronald Burt's work have shown that a lot of us would like to be brokers of information among social groups, but there are also need for an audience, or followers, who might become brokers later, but not everyone all at once.

I believe that data manipulation of open data sources to follow the same curve.  Yes, some cancer patients will want to read all they can about their condition, and do the analytical work, and others (not necessarily because of tool limitations) would prefer to take a backseat, and let others curate the information for them.  What's interesting is that they might want very simple interactions that enable for basic sorting of data, or maybe even services that interpret the data for them (e.g. doctors), but they would prefer someone else does the bulk of the work (even if it becomes very easy, due to tool research and development).

Given that, what can we do?

First, it's quite clear that much of the hard work remains in data import and cleaning.  To democratize data analytics and manipulation, the bulk of the difficulty is dealing with data acquisition.  Unfortunately, most of this is engineering and not sexy research, so there aren't really innovative work in this area, but some information extraction (AI-style algorithms, and some machine learning techniques) are making some inroad in this area.  I also believe that mixed-initiative research for data import is sorely needed.  We're doing a bit of this work in my lab at the moment.

Second, there is the issue of data literacy. What kind of visualization works with what kind of data? What analytic technique is appropriate.  Early work by Jock Mackinlay (from our old UIR research group) pointed to the possibility of automating some of these design choices in his Ph.D. research, and we haven't made a huge amount of progress in this area since then.  He is now at Tableau software trying to solve some of these issues.  Wizards, try-visualization-refine loops have all been tried in research.  We need to stop inventing new visualizations, but actual usable tools for people here.  By going to vertical domains, we will learn how to solve this problem.


These two are the biggest problems, IMHO. Of course, there are other technical challenges such as data scale and compute power, security, privacy, and social sharing, which are all fascinating, and research such as ManyEyes have done a lot to teach us a few things about these issues.