In the previous posts I talked about some of the papers I have read recently on spectral clustering and graph based methods. I wanted thought I would provide one more example that would try and explain the use of graph Laplacian. The basic idea is that the graph lies on a 'manifold' which, to completely simplify it, would mean that the graph lies in some multidimensional space.
For example, here is a random dataset that was generated using three Gaussian distributions. In order to obtain a graph from these points, an RBF kernel was used. The corresponding graph Laplacian is then computed from this kernel. When we project the data onto the vectors corresponding to the second and third smallest non-zero eigenvalues , we can see that the three Gaussian distributions are clearly distinguishable. This type of projection is known as Laplacian eigenmap. For more details please refer paper by Niyogi et al.
As an example of how this applies to the community detection / Semi-Supervised graph learning problem, here is a graph of the political books dataset. The corresponding projection is shown below. While most of the conservative and liberal books can be identified some neutral books are a bit ambiguous making them harder to classify correctly.Typically, in case of spectral clustering we would only need to consider the second smallest eigenvector of the graph laplacian. It is also known as the "Fiedler" Vector and is used in graph partitioning.
Please note that this series of blog post on spectral methods is my attempt to share my understanding of the papers I have read recently. I am happy to receive further references and clarifications of the points discussed here.
Recent Comments