Both Amazon and Netflix's business models rely on effective recommendation systems. The recommendations provided by such systems are based on the purchasing habits of millions of customers. As such, these systems are non-trivial and have evolved out of years of research in both academia and industry.
In addition to mining millions of customer transaction records, for many products there is a vast amount of information available online. While I do not have a lot of familiarity with recommendation systems literature, it seems obvious that the Web and Social Media is a great source of information that could be useful when building such systems. Bloggers' profile pages, wishlists, netflix queues, book lists and the blog posts themselves are potential clues to learn which two items may be related to each other.
As a simple example, consider the movie "Pulp Fiction", by querying Google for all the inlinks to the IMDB homepage of Sin City Pulp Fiction and counting which are the other movies that are "co-cited" here is a list of five movies that are most likely to be related to "Pulp Fiction":
Most of these look quite relevant. Some critics have claimed similarities between Pulp Fiction and Snatch. One surprise though was LOTR, I wouldn't have expected it to be grouped with Pulp Fiction, but I guess I like them both very much -- so it seems reasonable in my case atleast.
Just for fun, here is another example with "Sin City" another one of my favorite movies.
Unless you have a large index of the Blogosphere or the Web, it would be quite inefficient to mine for such correlations (by passing queries to search engines) on a large scale. I do not know how much of the search engine information is leveraged in recommendation systems built by Amazon or Netflix. It might also be worth looking into differences in the recommendations produced on the basis of "how people co-cite two products" vs. "how people purchase two products".
OK Turns out that Typepad is not rendering the images correctly. I tried to edit the HTML but that did not help either... So I am working on fixing the post... sorry about the trouble.
Posted by: Akshay Java | May 02, 2008 at 08:28 PM
Co-citation makes for an interesting mix of collaborative filtering (to the extent that citations are social signals) and content-based similarity (to the extend that the links carry semantic content). While Amazon and Netflix maintain secret, proprietary recommendation algorithms, it is safe to assume that they combine both social and semantic information.
If you're interested in learning more, check out my recent post on social navigation at The Noisy Channel: http://thenoisychannel.blogspot.com/2008/04/social-navigation.html
Posted by: Daniel Tunkelang | May 02, 2008 at 11:05 PM