While I was having a discussion about my previous FriendFeed posts, I had mentioned that it would be interesting to see the split up of likes w.r.t the directed, shortest path distance. Following graph shows the number of likes that are received from people you directly subscribe to, or subscribe to 2 hops away, 3 hops away and so on...
Both from this graph and the earlier post, we can see that almost 45% of the likes are from people you directly subscribe to. But what is really interesting is that almost 90% of the remaining likes are accounted by friends who are just two or three steps away. FriendFeeders sure are a close bunch!
This type of locality of reference is also related to what sociologists have called the triangle closing. Partly, this is also due to the fact that FriendFeed tries to make it easier to attain "closing of triangles" by displaying the updates from friends who are just one step away. It would be worth looking into how this network's evolution takes place by recrawling FriendFeed subscription graph and observing how it changes. I suspect that through the likes and comments users find new friends and subscribe to their feeds: Your Friend's Friends who like you may be quite likable themselves!
I believe that models of Web graphs (small world, preferential attachment) are getting even more sophisticated in such social settings. The act of linking is not just creating a new hyperlink anymore -- but it is fragmented into actions such as "I subscribe to someone", "I comment on someones posts", "I like someones post". Such actions are not limited to FriendFeed alone. You can draw similar analogies in Youtube (watching, rating, commenting...), Flickr etc. As we explore these datasets through our empirical analysis, it might bring in a better understanding and intuition to build newer models for Social Graphs.
This data was generated using the 200,000 likes and the graph consisting of subscription information of over 75,000 users. The actual shortest path calculation was done using the Boost Graph Libraries (BGL). This is an amazing and extensive tool for graph analysis and quite frankly, I am a bit surprised that I hadn't used it before! I had used the MATLAB bridge but for very large graphs it is best to run the algorithms using BGL. But, more on that later!
Akshay, interesting analysis. In general: How large is the share of people on Friendfeed that can only be reached by more than 4 hops and less than infinite hops (i.e. connected)?
Posted by: Benedikt | August 05, 2008 at 07:47 AM
Although it can be done, computing all pairs shortest path might be quite expensive for this network. I was only able to get away with the above graph since I needed to compute the shortest path between a subset of nodes (that have the likes relationship). I am aware of some tools that let you plot the hop count (which is what you might be interested in). I shall look into it and see if I can generate the answer for you easily. BTW I saw your posts on FF as well. Very cool visualizations! Thanks.
Posted by: Akshay Java | August 05, 2008 at 12:04 PM
Akshay,
(You might already be aware of what follows) I suspect your analysis to be related to Granovetter's "Forbidden Triad", which states the following:
1) Imagine a strong tie (a strong social link) between A and B
2) Imagine a strong tie between B and C
3) Then, the forbidden triad implies that a tie exists between C and B (it forbids that a tie between C and B does not exist)
Assuming forbidden triads and strong ties, one would EXPECT that a strong tie between the author A and the friend-of-a-friend commenter C MUST exist (the comment can be seen as surrogate evidence for that). From that follows, that the link between A-B is not a bridge (in a network sense, because there is another path A-B that goes through C), which creates an interesting avenue for further analysis: how can this observation be used to identify bridges in the friendfeed network? Or in other words, what users play the role of brokers or "hubs"? Removing those would decrease network connectivity.
Your data seems to represent an interesting web2.0 approximation of Granovetter's ideas. More information can be found in Granovetters Paper "The Strengths of Weak Ties" (1973)
Posted by: Markus | August 05, 2008 at 04:40 PM
Indeed, this is related to Granovetter's ideas. BTW you mean A and C in bullet 2 right? It would certainly be worth looking at ways in which we can classify users into different categories: Information brokers, hubs etc. I think that the easiest way to identify brokers in a network would be to perform Normalized Cuts on the graph. This would trivially find the minimum edges that need to be removed to disconnect the network into smaller components. I believe that in social network research we might now be moving towards looking at both the strong ties and the brokers. In a recent paper by Dr. Jon Kleinberg (The Structure of Information Pathways in a Social Communication Network) I read an interesting definition of a network backbone as "the set of edges that balance qualitatively different kinds of information flows; flows that arrive at long range over weaker ties and flows that travel quickly through densely clustered regions of the network".
I think that the availability of such interesting datasets provides a great opportunity to explore these ideas and verify some of the theories presented in "The Strengths of Weak Ties" and other classic works. Thanks for the comment and I would be happy to talk to you further about this.
Posted by: Akshay Java | August 05, 2008 at 05:09 PM
Of course you are right - I meant A and C in bullet 2. I think your preliminary results look already quite promising, and I am curious about any publication coming out of this.
Posted by: Markus | August 06, 2008 at 09:44 AM
Markus, Thanks! I do hope that I can get something interesting going and eventually publish something -- right now it is still preliminary. This is also my attempt to try out an "Open Research" approach. While research is fundamentally collaborative, often we wait till we have a polished version before we share it with others (at least publicly). I wondered if blogging can be used as a means to share intermediate results and get instant feedback from other experts like yourself - thus learning from their insights and suggestions as you work on refining your techniques. Already, I feel I am learning a lot more by sharing than if I had waited till I can produce a TR/paper. I will keep you posted on how things progress. Thanks again for the suggestions!
Posted by: Akshay Java | August 06, 2008 at 04:17 PM