I am at SXSW this weekend and wow what a conference this has been! Met a lot of cool people and been to a bunch of happening parties! I plan to post a few updates about the conference. In the mean time here is a quick post from notes I wrote during the panel discussion on Collaborative Filters.
Anton Kast from Digg.com gave a brief history of collaborative filtering from the research literature covering tapestry (from parc) grouplens (UMN) and a few other projects mostly from 1990s and draws comparisons to current systems used in spam filters, pagerank , tagging systems and applications in facebook ads etc. The talk wasnt very technical and meant to be a quick overview for the audience that might not be familiar with the research background. Kast described recommendation as the output of collaborative filtering that is personalized for a particular user in the system. For example amazon (people who bought this also bought..), behavioral ad targets, google news etc. I wish Kast had spent more time discussing the last slide which covered practical problems: sparsity problem in recommendation systems, early rater problem/cold start, gray sheep (small group very popular).
Erik Frey from last.fm shared some interesting stats -- 25m unique users/month. The types of relations that last.fm derives: songs-users, users-users. He also shared some insights on what were described as lean forward and lean backward mode in music recommendation systems. The lean forward mode is when a user is feeling a little experimental and would like to discover new stuff. The lean backwards mode is when the emphasis is on continuity and the desire to listen to something familiar (i.e. radio experience). The two main data sources for last.fm are : scrobbles, social info like tags. Frey suggests that tags are quite useful to explain recos
The next speaker was the cto of baynote -- a recommendation system that is provided as service using javascripts that enables clients to customize their website in real time based on user behavior. They also apply it for reordering the search results.
Finally, Jon Sanders from Netflix gave a very good talk covering the history of netflix and its evolution. Today netflix has about 2 billion ratings and gets about 2 M ratings per day. 60% of movies watched are via personalized recos -- amazing!
Following were the steps in evolution of netflix:
step 1 At first netflix was a editorially managed site and the first evolution was that of adding a rating widget.
step 2 score and sort all movies using rating alone (top k list)
step 3 movie similarity by k-nearest
step 4 interest based recos (same actor director etc)
step 5 ask ( get users to rate genre)
step 6 ask others (top 10 lists from users)
step 7 explain why (because you enjoyed)
step 8 $1M prize
step 9 develop a unique experience and customized website for each customer
future: streaming video specific recommendations, incorporating implicit and explicit metadata, discovery focus, let people drive and not be led.
Recent Comments