What is similarity? Does the richness of meta-data and the range of content in social media make it harder to define similarity in the context of Social Media? Jonathan's comments reminded me of a few thoughts I had recently on similarity in Social Media. And I think Jon raises a critical point which applies to this post as well. Like relevance, how we define similarity would really be dependent on the task.
Nevertheless, the traditional view of similarity measure is in terms of document similarity measured using cosine similarity. An extension is that rather than using content similarity, some researchers model topics using techniques like LDA, thus resulting in topic similarity. While, content/topic similarity is surely still used in blog analysis there are wider set of interesting features to build on.
Tag Similarity: Folksonomies for instance offer user-generated tags. This is a helpful feature to find out if two blogs are somehow related. The tags used by the author of the post is another feature. These may or may not correspond with the tags used by their readers (or on bookmarks in del.icio.us). I find it interesting that these two may often complement each other. Authors might tag their posts with specifics (like "Hilary clinton", "immigration" etc), while the reader can simply categorize the blog under a general tag "politics".
Community Structure: The graph itself has cues that offer hints about the similarity between two blogs. Community structure is definitely of importance and blogs that structurally belong to the same cluster are often related. The slight issue is that link semantics in the Blogosphere can be different from that of the Web.
Link Semantics: Traditional view of links is that of endorsement, but in the Blogosphere, sentiments may even play a big role. Related with link semantics is also the question of relation-type. Blogrolls and linkrolls are yet another way in which two blogs can be related. Adding a blog in my blogroll might not necessarily mean that its topically related -- it could just mean a sort of vague friendship / foaf:friend type of a relation.
Additionally, a significant percentage of links created on blogs point to main stream media. One can consider this as a bipartite graph with blogs and MSM sites as the two node types. Clustering blogs that point to similar set of MSM sources can reveal additional similarity cues. This would be more particularly distinct in political domain.
While I approach this question from the viewpoint of blogs, social network and flickr communities have still further relation-types and features. In such systems comments, group memberships, affiliations etc. can be additional indicators for similarity measures.
I believe that it is still an open question how such varied features can be utilized to measure similarity in Social Media. I am sure that there are going to be a TON of interesting discussions on some of these issues at ICWSM, which I am eagerly looking forward to.
Recent Comments