One form of matrix factorization is Singular Valued Decomposition (SVD). This is a powerful technique and it has several applications in information retrieval and graph analysis.
Another matrix factorization technique I had mentioned recently was Non Negative Matrix Factorization (NNMF). The advantage of NNMF over SVD is that it is easier to compute and is generally much easier to interpret due to the strict positivity constraint.
Matrix factorization can be achieved via optimization methods. Suppose a matrix A (shown in the figure on the left) of size 20*20 was to be factorized into two matrices X of size 20*4 and W' of size 4*20, the following objective function can be minimized:
J = || A - XW'||_f
The cost function minimizes the Frobenius norm between the original matrix A an XW', i.e. the error in approximating A as a product of two matrices. This can be solved using conjugate gradient methods and MATLAB's optimization toolbox (fminunc; tutorial) is one way to implement this. Following is the MATLAB code as an example:
test = ones(5,5);
B = blkdiag(test,test,test,test);
M = rand(40,4);
[xnew,fval] = fminunc(@obj_fun1,M,options,B,20);
function [fun,Grad] = obj_fun1(Z,A,nodes)
[m,n] = size(Z);
X = Z(1:nodes,:);
W = Z(nodes+1:end,:);
% Objective Function
fun = norm(A-X*W','fro')^2+norm(W,'fro')^2;
if nargout > 1
Grad1 = 2*(X*(W'*W)-A*W);
Grad2 = 2*(W*(X'*X)-A'*X)+W;
Grad = [Grad1; Grad2];
end
Once we minimize the objective function, we can obtain the solution for X as
82.5664 -1.1484 79.4176 -39.0137
82.5664 -1.1482 79.4176 -39.0137
82.5666 -1.1485 79.4176 -39.0139
82.5666 -1.1485 79.4176 -39.0139
82.5667 -1.1472 79.4173 -39.0141
-6.3391 -18.4040 68.2625 88.6399
-6.3389 -18.4039 68.2623 88.6397
-6.3389 -18.4039 68.2624 88.6398
-6.3388 -18.4037 68.2622 88.6396
-6.3386 -18.4036 68.2621 88.6395
75.9984 13.2685 -57.4890 70.8761
75.9989 13.2680 -57.4891 70.8759
75.9985 13.2681 -57.4889 70.8761
75.9985 13.2687 -57.4891 70.8760
75.9989 13.2681 -57.4890 70.8758
-17.6262 112.9716 27.4847 14.5483
-17.6257 112.9713 27.4844 14.5482
-17.6263 112.9716 27.4847 14.5483
-17.6259 112.9715 27.4844 14.5484
-17.6255 112.9708 27.4844 14.5482
From this essentially the community structure can be easily determined (observe the rows can be grouped to reflect the original communities). However, a much faster and efficient (in terms of implementation) way to accomplish this goal is using something like Singular Valued Decomposition (SVD).
The above code is just a simple illustrative example. However, for me it was a worthwhile experiment to try out and to understand how matrix factorization via optimization can be useful in community detection.
Some recent, interesting papers that use different Matrix Factorizations:
- Algorithms for Non Negative Matrix Factorization Daniel D. Lee, Sebastian Seung [Link]
- Structural and temporal analysis of the blogosphere through community factorization Yun Chi, Shenghuo Zhu, Xiaodan Song, Jun'ichi Tatemura, Belle L. Tseng [Link]
- Combining content and link for classification using matrix factorization Shenghuo Zhu, Kai Yu, Yun Chi, Yihong Gong [Link]
- Learning from Incomplete Ratings Using Non-negative Matrix Sheng Zhang, Weihong Wang, James Ford, Fillia Makedon [Link]
- Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization Tao Li, Ding, C., Jordan, M.I. [link]
- NMF Software by Chih-Jen Lin
- I believe the statistics toolbox now supports NMF too.
- NMFPack from HIIT
I would appreciate if anyone has pointers to other interesting references/tutorials/software and could please leave me a comment.
NMF has been used in community detection in this paper: Zhang, Shihua; Wang, Rui-Sheng; Zhang, Xiang-Sun. Uncovering fuzzy community structure in complex networks. Physical Review E, vol. 76, Issue 4, id. 046103
Posted by: Rashel | January 21, 2010 at 12:10 PM
NMF for community detection can be found in Physical Review E, 76, 046103, 2007
Posted by: Rashel | January 21, 2010 at 12:36 PM