Monday, October 29, 2007

Differences between Social Tagging and Collaborative Tagging

I'm here at the InfoVis conference in Sacramento and a conversation with Marti Hearst over at UCBerkeley just reminded me why I have been bothered by the 'confusion' between the phrases "social tagging" and "collaborative tagging" for quite some time. In fact, Wikipedia has a redirection of "Social Tagging" to "Collaborative Tagging" (see http://en.wikipedia.org/w/index.php?title=Social_tagging&redirect=no). This, I would argue, is wrong. Why?

'Collaborate', according to the American Heritage Dictionary, is "to work together, especially in a joint intellectual effort." The problem is that tagging features in many of the popular Web2.0 tools such as Flickr and YouTube are not really 'collaborative', since users aren't really working together per se. In YouTube, for example, only the uploader of the original video clip can specify and edit the tags for an video. Most of the time, in Flickr, one only tag their own photos. However, Flickr is somewhat more collaborative than YouTube because the default setting for any account is to allow contacts such as friends and families to also tag the photos.

Both of these two systems don't seem that 'collaborative', because, to me, collaboration implies shared artifact, shared workspace, and shared work. On the other hand, 'social' is "living or disposed to live in companionship with others or in a community, rather than in isolation". In other words, simply existing and having some relation to others in a community. So for example, I would argue that in YouTube, we have social tagging but not collaborative tagging, because while users tag their uploaded videos in the context of a online social community, and they do not collaborate to converge on a set of tags appropriate for that video.

The use of the term 'collaborative' in past Computer-Supported Cooperative Work (CSCW) field has especially come to imply
a shared workspace. With shared workspaces, often there are some elements of coordination and conflicts involved as well (and hopefully conflict resolution as well). So in contrast to YouTube, the most 'collaborative' tagging system I know is the category tagging system in Wikipedia. Anyone can edit the category tags for an article. They can remove, add, discuss, and revert the use of any tag. In this case, the category tags are shared artifacts that anyone can edit inside a shared workspace. The work of tagging all 2 Million+ articles in Wikipedia is shared work among the community.

It's perhaps interesting to note that somewhere in between YouTube and Wikipedia tagging is perhaps the bookmarking system del.icio.us. In del.icio.us, there is a shared artifact (the tagged sites or URLs), and there is shared work of tagging all of the websites and pages out there on the Web. However, there is less of a notion of a shared workspace. My tags for an URL could be and probably is different from someone else's tags for the same URL. I also have the capability of searching within just my own del.icio.us space. So from least collaborative to the most collaborative, we have YouTube, then del.icio.us, and then finally the category tagging system in Wikipedia.

A simple way to explain this is that one must be social in order to collaborate, but one need not be collaborative to be social. So in summary, I would argue that social tagging is a superset of collaborative tagging. But a social tagging system may not necessarily be a collaborative tagging system. We should change the definitions in Wikipedia to distinguish between these two types of systems.

Sunday, October 28, 2007

Social Networks, Brokers, and Social Information Foraging

How do social groups collect and make sense of information? How can we scientifically understand such processes and turn that understanding into knowledge that guides the art and engineering of new designs for the social Web? In this post I summarize some findings about the operation of social networks in science and business.

Why Social Information Foraging?

For more than a decade, researchers at PARC have studied information foraging and sensemaking at the level of the individual user., which has had some degree of influence on practice. Now, the Augmented Social Cognition Area (ASC) at PARC is pushing that research to social information foraging and sensemaking with a special focus on the Web. There are many reasons for this, including
  • Recent catastrophic failures in decision-making attributed to a lack of cooperation in collecting and making sense of information. For instance the Senate 9/11 report, and the NASA Columbia report both focus on poor cooperation in finding, sharing, and making sense of infomration
  • Virtually all significant discoveries, inventions, and innovations are the result of collective activity that depend on standing on the shoulders of others.
  • Recent gushing about "wisdom of the crowds" and similar phenomena points to the power of cooperative processes, but things can go wrong too.
  • The Web is emerging (for better or worse) as the primary source of scientific and technical information n both professional and everyday life. The Pew Internet & American Life Project reports that the majority of online users turn first to the Internet for information about specific scientific topics, 87% of online users use the Internet as a research tool, and 62% use the Internet to check the reliability of facts.
Social Networks in Science

There is a whole scientific literature about scientific literatures. Scientific literatures are interesting because they they are examples of community networks of peers producing content that date back to at least the 18th century. Pamela Sandstrom, an anthropologist who also works in the library sciences, did a study of the information foraging behavior of scholars in behavioral ecology (a subfield of biology). Dr. Sandstrom used a variety of ethnographic and bibliometric techniques to try to get at scholarly information seeking.

One emergent pattern was the way that
individual scholars arranged themselves to be information brokers
That is, they all contributed to their "core" field (behavioral ecology) but also maintained connections to peripheral fields (e.g., mathematics, population theory, psychology, etc.). Individuals could be viewed as "brokers" of information from peripheral fields (e.g., a new mathematical technique) and the core field (e.g., application of a new mathematical technique to modeling behavior).

Another emergent pattern was that individual scholars had different foraging strategies:
  • Peripheral fields involve solitary foraging: 48% of the information resources used to write papers came from solitary deliberate search, information monitoring, browsing, or reading, and 61% of those resources were relevant to the periphery
  • Core field involves social foraging: 30% of resources come from colleagues at distributing or communicating information through informal channels (e-mail; pre-publications; face-to-face recommendations, etc), and 69% of those resources are relevant to the core.
Social Networks in Business: An Example of Structural Holes and the Social Capital of Brokerage

One of the big influences on our thinking in ASC is the work on Structural Hole Theory by Ronald S. Burt. The theory offers some insight about why the scholars discussed above might be motivated to arrange themselves to be brokers across different areas.

Burt's work is built around the analysis of social networks--network representations in which nodes represent people and links among nodes represent social relations, especially ones in which information might be communicated. Such networks tend to have clumpy arrangement. Clusters of people tend to interact with one another and less so with other clusters. The gaps between such clusters aree what Burt calls structural holes. Certain individuals can be identified as brokers or bridges acrss structure holes because they tend to have links that go from one tight cluster of people to another tight cluster (there is a specific network-based measurement called "network constraint" that does this mathematically).

Here's a summary of Burt's hypothesis about brokers and structural holes:
  • There is greater homogeneity within than between social groups
  • People whose social networks bridge the structural holes between groups have earlier access to a broader diversity of information
  • People whose networks bridge the structural holes between groups have an advantage n detecting and developing rewarding opportunities
  • Like an over-the-horizon radar in an airplane, brokerage across the structural holes between groups provides a vision of options otherwise unseen
Corroboration for this theory can be found in one of Burt's studies of a large firm. A total of 673 managers in the supply chain for the firm were studied to produce a social network analysis, and Burt did a network constraint analysis to measure degree of social brokerage. The managers were then asked to submit ideas to improve supply chain management and these were evaluated by a panel of judges. The results showed
  • Idea value increased to the degree that individual were measured as social brokers
  • The salaries of individuals increased to the degree that they were measured as social brokers (factoring out such effects as job rank, role, location, age, education, business unit, and location).
  • Managers who discussed issues with other managers were better paid, more likely to be evaluated positively, and more likely to be promoted.
Summary

Our relations and communications with others can be represented as a social network. Specific content flows and dissipates through these networks. In both science and business it looks like certain "brokerage" position are source of discovery and innovation--places where specific individuals get exposed to a greater diversity of ideas, and ideas that may yet be unseen by others in a core group.

More generally, this research shows that it is possible to find things about social information flows that can be specifically related to better information foraging and sense making.

Thursday, October 11, 2007

Stanford Talk on Augmented Social Cognition

I'm giving a talk on the work we have done on Augmented Social Cognition and Wikipedia at Stanford next week on Friday Oct 19th, 2007:

http://hci.stanford.edu/cs547/abstracts/07-08/071019-chi.html

Title:
Research meets Web2.0: Augmented Social Cognition sheds light on Coordination, Trust, Wikipedia, and Social Tagging

Abstract:
Over the last few years, we've realized that many of the information environments are gradually turning people into social foragers and sharers. People spend much time in communities, and they are using these communities to share information with others, to communicate, to commiserate, and to establish bonds. This is the "Social Web". While not all is new, this style of enhanced collaboration is having an impact on people’s online lives, so we've formed a new research area here at PARC to go after these ideas in depth.

“Augmented Social Cognition” area is trying to understand the enhancement of a group of people’s ability to remember, think, and reason. This has been taking in the form of many Web2.0 systems like social networking sites, social tagging systems, blogs, and Wikis. In this talk, I will summarize examples of recent research on:
- how decreasing the interaction costs might change the number of people who participate in social tagging systems?
- how conflict and coordination have played out in Wikipedia?
- how social transparency might affect reader trust in Wikipedia?

Wednesday, October 10, 2007

On the Road: Web 2.0 in in the Enterprise

While more enterprises contemplate the benefits of Web 2.0 social software (enhanced collaboration, innovation, knowledge sharing), the coordination and interaction costs that occur in social systems are often overlooked. Based on extensive studies of social systems such as del.icio.us and Wikipedia, PARC has identified multiple factors that must be managed to realize the full benefits of these systems within the enterprise.

These and other insights will be presented at the conference KM World & Intranets 2007 (November 6-8 in San Jose), by Ed Chi, PhD (manager of PARC's augmented social cognition research area) and Lawrence Lee, director of business development. If you're attending the conference, please visit PARC booth #313 — or if you're interested in attending, e-mail pr@parc.com for a free expo pass and conference discount code.

Here is the conference website: KM World and Intranet 2007

Friday, October 5, 2007

Social transparency and the quality of co-created contents

How do you measure the accuracy and quality of what people are collectively creating? For example, on Yahoo! Answers, people post questions and tons of people respond. How would you measure the quality of the content?

What’s amazing about this as a research area is that it starts to touch on deep classic philosophic questions like: What do we know about authority? What does it mean? Where does authority come from? What makes someone trust you? When you ask a question about the quality of any information, you have to answer these questions. Who is the person who wrote it? Why should I trust that person? Just because Encyclopedia Britannica hires a bunch of experts to write for them, why should I believe them? What makes them an authoritative figure on how bees build their beehives? What is it about their authority, just because they’re attached to some higher education institution, that makes you want to believe them more than someone else?

When the Augmented Social Cognition research group tried to answer these questions, we ended up with an internal debate about what we mean by “quality.” And I think we come up with a model for understanding quality. We realized that, in academia, much of authority and the assignment of trust actually comes from transparency. Why should I believe in calculus? Well, because the mathematics is built on a foundation of axioms and rule sets that you can follow, which you can look up and examine. You trust calculus because there is a transparency built into the system. You can come to your own conclusion about the quality of the information based upon an examination of the facts. This is the scientific method!

What’s interesting is that exactly the same argument is being applied to Wikipedia. It says to you: you should believe in the quality of the information in Wikipedia because it’s transparent. Anyone can look at the editing history and see who has edited an entry, whether they chose to sign their name after it, and what kind of edits they made in other parts of Wikipedia. Everything is transparent and completely traceable; you can examine Wikipedia back to the first word that was written. And Wikipedia is relying on the fact that it’s completely transparent to gain authority. There is nothing opaque about it. I think that’s why Wikipedia has become so successful. It’s because they stumbled upon some of these fundamental design principles and paradigms that makes this work. They could have made the design decision where one can only examine the last 50 edits. Wikipedia could have come up with many other design choices that would not make the system completely transparent. Is it an accident that they ended up with a system that can be traced back to the first edits? I think not.

However, (and that's a big however!), some people are still having trouble with the quality of information on Wikipedia even though it’s transparent. Why? One possiblity is that they have an all-or-nothing attitude. Well, if one article could be way-off, why should I trust another article? They don't, and probably don't want to, examine the history of individual articles before deciding on their individual trustworthiness, perhaps because it's too hard and too time-consuming.

So one hypothesis is that readers don't have the right tools to easily examine and trace back the editing history. That's why the idea of the WikiDashboard might be a really powerful way for fixing these problems. Social dashboards of these kinds are visualizations or graphical depictions of editing histories that will make it much easier for people to look at the history of an article and make up their own minds about its trustworthiness. The tool will enable us to do fundamental research on testing the hypothesis that transparency is what enables trust.

One thing we have done is to actually ran some experiments to understand if people are more willing to believe in information if you make the editing histories and activities more transparent. More on that on the next post.