The overall system uses a sophisticated MapReduce computation in the backend, and the implementation is non-trivial. Here is how it works. The diagram below was recently published in an IEEE Computer Magazine article, and it roughly describes how the data flows thru the whole system. (Click on it to enlarge it.)
First, a crawling module goes out to the web and crawls social tagging sites, looking for tuples of the form
A MapReduce system based on Bayesian inference and spreading activation then computes the probability of each URL or tag being relevant given a particular combination of other tags and URLs. Here we first construct a bigraph between URLs and tags based on the tuples and then precompute spreading activation patterns across the graph.
To do this backend computation in massively parallel way, we used the MapReduce framework provided by Hadoop (hadoop.apache org). The results of this computation are stored in a Lucene index so that we can make the retrieval of spreading activation patterns as fast as possible.
Finally, a web server serves up the search results through an interactive frontend. The frontend responds to user interaction with relevance feedback arrows by communicating with the web server using AJAX techniques and animating the interface to an updated state.
Reference:
Ed H. Chi, "Information Seeking Can Be Social," IEEE Computer, vol. 42, no. 3, pp. 42-46, March, 2009.
No comments:
Post a Comment