I read a superb blog post "Who is your Information Filter?". I think Hutch makes an important point here (text in bold is highlighted by me):

In an equal world, information shared by any of your friends will merit click-throughs and discussion. But the practical reality is that some people will be more “equal” than others in terms of driving the discussion agenda. There are two highly correlated components to that:

- Number of subscribers
- Reputation for identifying what is interesting

**If you subscribe, you can’t help but be overwhelmed by the discussions they can kick off.**

Subscribing to somebody's feed is like a commitment. On social network sites, your actions often speak for themselves. The posts one comments on or likes are quite predictive of who you subscribe to. This is what I found from the analysis of the "likes" in FriendFeed. The following graph shows the probability that a user A subscribes to a user B, given that User A "likes" k or more posts by user B. Typically, if a user has 'liked; more than 3 posts of another users there is over 80% probability of a subscription.

On one hand, this is because users are more exposed to posts from those whom they subscribe to. On the other hand, as the number of such actions between two users increases, so does the probability that there will be a continued commitment in terms of a subscription. Similar trends are observable from the "comments" graph.

The way I see it, an important implication of this is ensuring that the social network makes it possible for users to find and "try out" feeds they might "like". Findability and explorability of social networks allows users to initially explore posts from others and then decide which of these users they would like to pick as their "Information Filters".

Hi Akshay,

I have a question about your graph. Is the probability you are plotting actually the probability of subscription given k articles flagged as "liked" prior to subscription? Just curious.

It would be interesting to see the confidence bounds on the probability for each k. I would expect those to grow significantly as k grows, assuming you have fewer and fewer examples where subscription follows only after >=k posts. Although maybe the number of examples you have even when k is large is also large and therefore the confidence bounds are tight across the domain of the plot?

Chris

Posted by: Chris Diehl | August 06, 2008 at 11:15 PM

Hi Chris

Unfortunately I do not have access to the date when a user A subscribed to a user B's feed. Although if we collect the data over a sufficient period of time we can approximate this to reflect the probabilities accordingly. Right now the subscription graph does not contain this information and is only a static snapshot. From reading the paper "Group Formation in Large Social Networks: Membership Growth and Evolution" by Backtrom et al. I realized that the one way to recompute the graph would be to split likes and subscription information over say two months and that might help compute the confidence bounds (by considering new subscriptions found in month two, say). I shall try this method and see if the data i have is sufficient to update the graphs. Thanks for the note.

Thanks

Aksahy

Posted by: Akshay Java | August 07, 2008 at 02:51 PM

Hi Akshay,

Actually collecting additional data after the subscription would not help in estimating the probability given the nature of the definition I proposed. Time is key there to know what events occurred when relative to the subscription event. But maybe that is not what you are interested in?

If you assume the users are IID, which you inherently assume in the estimation of the probabilities, you can compute distribution-independent confidence bounds using Hoeffding's Inequality for each probability estimate.

Chris

Posted by: Chris Diehl | August 07, 2008 at 07:43 PM