Spectral Clustering is a really simple and effective technique for graph partitioning and it can be easily used for community detection tasks. Although running spectral clustering tools or writing a basic algorithm is only a few lines of code, sometimes the intuition of how it works is not quite obvious. I don't claim to be no expert either, but I have been exploring how it can be applied to problems in Social Media graph analysis and am quite excited about this technique.
A graph can be expressed in terms of a similarity matrix. Spectral analysis of graphs deals with the study of properties of the eigenvalues and eigenvectors of the adjacency/similarity matrix of a graph (often expressed in terms of a Laplacian matrix). A simple version of the graph Laplacian is L = D -A, where D is a diagonal matrix containing the degrees of each node and A is the adjacency matrix. One way to think about this is in terms of Fourier Analysis. A periodic function can be represented in terms of sine and cosines such that the sum of cosine and sines can smoothly explain the function. Similarly using the graph Laplacian, what we are trying to achieve is a smooth approximation of the function f(x) over the entire graph. Such an approximation can be achieved by minimizing an objective f'Lf.
Another interesting explanation is the one offered by matrix perturbation theory. Imagine that the nodes of the graph were golf balls tied together by strings (which stand for edges). Now if the entire graph was laid out on a in space and a small perturbation was introduced by hitting a single ball, how would the rest of the balls in this network move? Balls that are connected via multiple paths and those that are closer to the original ball that was perturbed would fluctuate more than the rest. Similarly, what f'Lf does is find a smooth function f, over the entire graph such that it would approximate these perturbations over the graph.
Here is a nice Matlab based tutorial and for computing Normalized Cut, Shi and Malik's implementation is very handy. For example, shown below is a sparsity plot of a graph of 6000 blogs. By running the Ncut algorithm, with about 100 partitions the community structure in it become clearer.
While spectral methods for graph analysis have their origins in image processing, they are finding a number of significant applications in Social Media analysis. They seem particularly attractive because of their relative ease of implementation and excellent results. For a perspective on this topic specifically from a community detection point of view, a good paper to read is by Dr. Mark Newman. Since blog social media data is so different from images, several new extensions have been suggested to incorporate different characteristics of such networks. Some recent trends are: evolutionary spectral clustering (temporal behavior), graph regularizations (semi-supervised learning/classification on graph data) among many others.
Hi Akshay,
Are you aware of any work done on spectral clustering on directed graphs?
Please send some pointers across, if you know.
Rajiv
Posted by: Rajiv Das | July 22, 2008 at 10:28 PM
Hi Rajiv, One paper on Spectral clustering for directed, weighted graphs is by Marina Meila et al. http://www.stat.washington.edu/mmp/Papers/sdm-wcuts.pdf and the other reference that I am aware of is by Dr. Mark Newman http://arxiv.org/abs/0709.4500 If you find other interesting papers please do share it with me. Thanks for stopping by.
Akshay
Posted by: Akshay Java | July 22, 2008 at 10:41 PM