Contact Me


  • Akshay Java's Facebook profile

Social Media Events

Friends

Disclaimer

  • Thoughts and comments expressed here are those of the author. Creative Commons License

recommendation systems

July 17, 2008

Clustering Triples from Social Data

TagtriplesBy far, the most prevalent data available in social media is tagging information. For example, in del.icio.us a user may tag a URL or in Flickr she may tag an image. One of the questions that comes up is how to then cluster social data that is rich in tags. Some techniques available ignore the user information and use only a bipartite graph consisting of tags and URLs. Another method is to represent two pieces of evidence (user-tag;tag-blog) in a tripartite graph (where nodes are of three different types: users, tags and urls). However, Realtripleseven this type of structure actually  misses the higher order relation between the three nodes. Note that the information available is really in triples of the type <user, tag, url>. This information is not captured by the tripartite graph model. In particular, two users may be connected via a common tag even if the actual URL they bookmarked is vastly different.

There are some techniques using Tensor Matrix Factorization that can handle such data. However, the question of how to deal with triple (or higher) information from social data is quite interesting. Moreover, being able to do so efficiently and in an online fashion would also be important. I believe that this topic may be of significant interest in the upcoming social media and data mining conferences. The implications of these techniques would be in building better recommendation systems and personalization algorithms.

[Thanks Vlad Korolev for some of the discussions related to this post]

July 06, 2008

The Cold Start Problem in Social Media

The Cold Start Problem in Social Media

Here is a classic cold start problem:

  • "Social Tools" can only be social if there are enough people on it.
  • And any social site is only as attractive as the number of friends you have on it.
  • The social site is only useful if there are enough people contributing to it. (be it annotating images, links or adding reviews)

The question is how do you get enough users to adopt a tool and build sufficient traction around it such that it attracts more users? I dont have all the answers and perhaps entrepreneurs and folks in startups are more knowledgeable about this than I am. But this is a question I have been pondering about for some time. Most of these points might be fairly obvious, but here are a few thoughts I'd like to share:

  • Above all, build something cool!
  • Realize that in many systems, 1% users acting as contributors is all it takes! (see Clay Shirky's book "Here Comes Everybody" for more on this). Ensure that the reward mechanism is automatically built into the site. These 1% of all users are not the ones that are motivated by money. They use your tools because they enjoy it or it solves some real problem they have been facing. I have seen some sites trying to "pay" users to add data to "seed" their site. For example, check out some of the high paying HITS on Mechanical Turk. In my opinion, this is like throwing money out of the window. Completely bogus way to jump start your site!
  • Provide APIs: One of the key factors that contributed to the success of Twitter was that they had a neat API that developers immediately adopted and had fun building cool toys. These 3rd party tools in turn make it easy for users to contribute and engage with your site, thus breaking the cold start problem. For example, even though not developed by Twitter, the plethora of third party twitter client make it easy to easily update your Tweets. 
  • Try to "seed" your site with datasets curated by web crawls, APIs, external databases or using the tools yourself. For example, if you are building a site that uses geotagging, you might consider using sites like geonames.org or if your site is around movies -- use IMDB or Amazon data to seed it.
  • Make sure stuff is findable and socially visible. Make it easy for your users to find the data they really care about and they will be willing to annotate. Moreover, make sure that it is easy for users to share what they annotate with their friends. The beauty of Facebook is the news feeds. People like to know what their friends are upto. On Twitter, I want to know what my friends are saying and be able to have conversations with them - without that twitter is just a chat room.
  • Dont ever SPAM! Every week I have a bunch of emails coming from some random sites that a friend of mine once joined. Make sure that in your invitation email you include an option to not receive any requests from your site in the future. I am amazed when I dont see this option at all. If a user does not wish to join a network, please dont keep sending them emails requesting them to join every time one of their friends sign up! Also, I am a sucker for alpha/beta testing for any new social media/social network site. Sometimes, I try even the ones that eventually land up spamming everyone on your email/IM. As a RULE -- never spam your potential users! That is the best way to piss them off even before they join it.
  • Listen and iterate rapidly. Your alpha/beta users are the most important. Listen to what they have to say. Also, if you cannot convince your friends and family to use the tool -- why would anyone else bother?

The cold start problem has been studied in computer science, particularly for recommendation systems**. A good place to start is the paper:

Methods and metrics for cold-start recommendations  Schein, r.I.; Popescul, A.; Ungar, L.H.; Pennock, D.M. [Link]

I am quite interested in knowing how startups have approached this problem in real situations and particularly, if there is any analytical data available to show what worked and what did not? I guess this might be information that few would be willing to share so openly.

** on a related note: The blog "Duke Listens" is an excellent source for more on recommendation systems. Also check out the recent post on cold start problem.

June 27, 2008

Its Getting Crowded Out Here!

FirefoxYep! Thats what my Firefox toolbar looks like these days. Bookmarklets are a convenient way to post stuff to various sites. I use it for sharing links with the ebiquity blog via del.icio.us, post something to friendfeed, check news every now and then and right now I am testing this bookmarking site called Gyzork.  A few other plugins I have installed are del.icio.us, socialbrowse and stumbleupon. I used to have a lot more but couldnt keep up with all the stuff going on in my browser on a single monitor! :-(

Perhaps its time to rethink how we use bookmarklets. Some thoughts on this:

  • It would be great to have icons for bookmarklets
  • I need a bookmarklet 'stack feature' -- a place where I can organize bookmarklets.
  • A listing of all the bookmarklets for different categories (GTD, News, Links, Blogs, etc...) would be handy for power users.
  • Maybe we even need a simple way to post a link on one place and have it pushed to other sites

Its getting crowded out here! I am wondering if there is a better way to manage this somehow?

[Thanks Anthony Vito, for some of the discussions that led to this post]

June 19, 2008

Email Interview: Nihaar Gupta, Youlicit

Nihaar Gupta, VP of Product development at Youlicit has kindly obliged to have an email interview with SocialMedia Research Blog. Following are the responses to some of the questions I had for him:

1) Please describe Youlicit to us?

Youlicit at its core is a discovery engine (http://blog.youlicit.com/?p=23). We want to connect you to the most relevant and recommended information as effortlessly as possible. As of now, we are building a technology that allows a user to find the most recommended sites (recommended by people around the web) related to a given URL. We believe that people are the best judges of content and more often than not, the information you are looking for has been found by someone before. Our goal is to aggregate that information and allow the user to access it with the click of a button.


2) Tell Us about your background, the team behind Youlicit and how started it?

Youlicit came about as a result of trying to solve our own frustrations with trying to find information on the web. With the enormous amount of user-generated content and annotations on the web, we saw a huge amount of valuable data that was inaccessible and fragmented. For the sake of brevity, the team bios & background are here http://blog.youlicit.com/?page_id=6


3) Please give us a brief overview of the technology behind Youlicit?

Youlicit aggregates user annotations of websites and other user generated content and analyzes it to create a URL-URL mapping of websites based on relevance and quality. Using this mapping we are able to deliver related and recommended sites to a user with a click of a button.


4) While using Youlicit plugin, I felt that one of the challenges is the coverage -- how do you plan to address this and build your current index?

We are constantly working on improving our coverage. There are two metrics we strive to maximize for our results, quality and relevance. In regards to quality, we’re always looking to increase our database of “quality sites” by tapping into the various kinds of user annotations (denoting quality content) that exist on the web (bookmarks, tags, votes, comments). In regards to relevance, we’re always researching novel ways to extrapolate connections between websites and map URL’s back to our database of “quality sites”.


5) How do you ranking the 'Enhanced Links' in the plugin? Do you also take into account how many users actually click through the suggested links?

Each result in the Youlicit More widget (and on Youlicit’s site) has a score based on the metrics above, quality of the site and relevance to the item being queried. We are looking into ways of scoring the results from implicit/explicit feedback that we get from users (clicks, recommends).


6) How do you ensure that the Enhanced links feature is non-intrusive?

The current version has manifested itself after a few weeks of alpha testing with a handful of bloggers. That said, we are still looking for feedback on the user interface and would love to hear opinions on how to make it more useful and less intrusive for bloggers/blog readers.


7) How would you compare the plugin to sphere's related blog posts?

While Sphere focuses related & recent blogosphere content, we, at Youlicit, are trying to provide the blog reader with more seminal information related to the blogger’s topic of conversation. For instance, if you are reading a blog entry on global warming, you are more likely to receive the most recommended articles (blogs, sites, essays) on Global warming from around the web rather than  recent blog entries on that topic.


8) What are the other features on Youlicit?

Youlicit’s primary product is a Firefox extension to access that allows a user to access our results during his/her browsing experience. We are in the process of redesigning our website and streamlining the current offering to focus on this button. Down the road we would like to be able to deliver personalized recommendations for users as well as connect users to people based on transient and long-terms interests (ideally using a person’s interests to enhance his/her social graph).


9) Would the plugin be adverting supported?

We do see advertising as a very possible source of monetization. Given the fact that we are providing contextually relevant information, the search model of advertising applies nicely. We are also exploring other possible means of monetization but as of right now the priority is to build something that people find useful.


10) What are the next things to look out for at Youlicit?

As I mentioned above, we are stripping down Youlicit to bring the focus back to its core; the Youlicit More functionality via the Firefox extension and blog widget. We expect to release a new designed website very soon. And as always, we love to hear feedback on what you think so far and how we can improve.


Youlicit: Search Less and Find More!

Youlicit Youlicit is a new tool that helps you "Search less and find more". Often we forget that search is only one means to find what we are looking for. Even search by itself is not the endpoint of an information need or a query. This tool reminds me of the "berry picking model" of Information Retrieval that I had read about first in my IR Class. The model basically says that:

Information need is not satisfied by a single set of documents but by bits and pieces found along the way.

The paper  titled "The Design of Browsing and Berrypciking Technique for Online Search Interface" describes a searcher as

Moving through many actions towards a general goal of satisfactory completion of research related to an information need.

What Youlicit does is provide this ability implicitly, without the reader (or more generally a searcher) having to go through the trouble of navigating and mentally processing through hyperlinks or firing search queries to find related content. Youlicit takes care of all that on your behalf. By providing a simple plugin, the Youlicit widget automatically highlights some of the related, relevant links and provides useful suggestions -- all without your audience ever leaving your blog. I love the idea and the neat implementation that these guys have built. (The very same need was what lead me to hack this Wikipedia related widget a few weeks earlier.)

On the Youlicit site, you have lots more interesting tools. You can discover new content that is relevant to your interests, find related users and share links with them or follow their interests. Youlicit is paving the way for social browsing tools and is a neat concept that is well implemented. Their index does not seem to be very large at the moment and I feel that it would get better as they start to seriously scale up. In the interim, I feel that there might be stopgap solutions that they could be employ -- for example the Alexa related URLs for the links that are not currently in Youlicit's index.

In relation to this plugin, one tool that is similar is the  Sphere plugin that shows related blog posts. I feel that sphere serves a complementary need. From what I understand Youlicit aims to find the interesting blogs and Web URLs one might want to look into in relation to a given hyperlink.

Another plugin is the Snap plugin -- which shows a screenshot of the outlink. However, in my opinion snap does not really serve much purpose and is a bad tool from a usability perspective.

Youlicit is non-intrusive and you are gonna enjoy the serendipity of finding interesting new links! Give it a spin!

May 02, 2008

Leaveraging Web and Social Media for Recommendations

Both Amazon and Netflix's business models rely on effective recommendation systems. The recommendations provided by such systems are based on the purchasing habits of millions of customers. As such, these systems are non-trivial and have evolved out of years of research in both academia and industry.

In addition to mining millions of customer transaction records, for many products there is a vast amount of information available online. While I do not have a lot of familiarity with recommendation systems literature, it seems obvious that the Web and Social Media is a great source of information that could be useful when building such systems.  Bloggers' profile pages, wishlists, netflix queues, book lists and the blog posts themselves are potential clues to learn which two items may be related to each other.

As a simple example, consider the movie "Pulp Fiction", by querying Google for all the inlinks to the IMDB homepage of Sin City Pulp Fiction and counting which are the other movies that are "co-cited" here is a list of five movies that are most likely to be related to "Pulp Fiction":

Most of these look quite relevant. Some critics have claimed similarities between Pulp Fiction and Snatch. One surprise though was LOTR, I wouldn't have expected it to be grouped with Pulp Fiction, but I guess I like them both very much -- so it seems reasonable in my case atleast.

Just for fun, here is another example with "Sin City" another one of my favorite movies.

Unless you have a large index of the Blogosphere or the Web, it would be quite inefficient to mine for such correlations (by passing queries to search engines) on a large scale. I do not know how much of the search engine information is leveraged in recommendation systems built by Amazon or Netflix.  It might also be worth looking into differences in the recommendations produced on the basis of "how people co-cite two products" vs. "how people purchase two products".

Google Ads

Related Wikipedia Entries

Ads

Recent Readers

Search this blog


  • WWW
    socialmedia.typepad.com

July 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
I Love 6A

Please Support