A few days ago, we were discussing a research problem in the lab. It reminded me of two papers I had recently read. The problem that we were talking about involved a graph where we knew certain nodes of interest and wanted to identify and rank other nodes that might be relevant. This is a classic example of a situation where random walks can be quite effective. Following are brief summaries of two papers on this topic:
The first paper titled "Ranking on Data Manifolds" is by Dr. Dengyong Zhou. It is discusses a neat interpretation of a random walk. Following excerpt from the paper describes the basic intuition:
So basically, one can think of a graph or a social network as data that lies on a manifold. In order to learn and capture the structure of the manifold a the Laplacian Matrix is used to represent the original graph. The advantage here is that the Laplacian is capable of capturing the long range relations between the nodes and ensures overall smoothness when propagating the scores on the manifold.
As the scores are iteratively spread from the nodes of interest to the neighbors (via the graph laplacian), the algorithm converges and corresponds to the random walks in the graph. The final scores provide the overall ranking of the nodes, with respect to the initial nodes of interest. The applications of this method range from image processing to web search and recommendation.
I really like this paper for its simplicity and clarity in explaining the connection between random walks and node ranking.
The second paper I would like to mention here is "Fast Random Walks with Restart and its Applications" by Hanghang Tong et al. It is a really nice paper that talks about how random walks can be used to identify how close two nodes are in a graph. Moreover, the paper discusses some really neat mathematical tricks that can be applied for efficient, "on the fly" computation of the random walks in large graphs. The basic intuition is to use the Sherman-Morisson Lemma to avoid directly computing the inverse of a large graph (which is prohibitive). This is actually a superb trick to use, especially when doing rank one updates. A number of applications are discussed in this paper, including image retrieval and neighborhood formation etc.
There is a more recent work by Alekh Agarwal and Soumen Chakrabarti on this topic, "Learning Random Walks to Rank Nodes in Graphs". There are many more papers and I have probably not read them all -- these are just a few of my favorites on this subject.
"We firstly define a weighted network on the data and assign an authoritative score to every query. The query points act as source nodes that continually pump their authoritative scores to the remaining points via the weighted network, and the remaining points further spread the authoritative score they received to their neighbors. This spreading process is repeated until convergence, and the points are ranked according to the amounts of authoritative score they received."
Manifold is defined by Wikipedia as:an abstract mathematical space in which every point has a neighborhood which resembles Euclidean space, but in which the global structure may be more complicated.
(rdfs:seeAlso Wolfram page on manifold)So basically, one can think of a graph or a social network as data that lies on a manifold. In order to learn and capture the structure of the manifold a the Laplacian Matrix is used to represent the original graph. The advantage here is that the Laplacian is capable of capturing the long range relations between the nodes and ensures overall smoothness when propagating the scores on the manifold.
As the scores are iteratively spread from the nodes of interest to the neighbors (via the graph laplacian), the algorithm converges and corresponds to the random walks in the graph. The final scores provide the overall ranking of the nodes, with respect to the initial nodes of interest. The applications of this method range from image processing to web search and recommendation.
I really like this paper for its simplicity and clarity in explaining the connection between random walks and node ranking.
The second paper I would like to mention here is "Fast Random Walks with Restart and its Applications" by Hanghang Tong et al. It is a really nice paper that talks about how random walks can be used to identify how close two nodes are in a graph. Moreover, the paper discusses some really neat mathematical tricks that can be applied for efficient, "on the fly" computation of the random walks in large graphs. The basic intuition is to use the Sherman-Morisson Lemma to avoid directly computing the inverse of a large graph (which is prohibitive). This is actually a superb trick to use, especially when doing rank one updates. A number of applications are discussed in this paper, including image retrieval and neighborhood formation etc.
There is a more recent work by Alekh Agarwal and Soumen Chakrabarti on this topic, "Learning Random Walks to Rank Nodes in Graphs". There are many more papers and I have probably not read them all -- these are just a few of my favorites on this subject.
Recent Comments