Inferring Privacy Information via Social Relations
Wanhong Xu, Xi Zhou Lei Li (presenter)
The presentation started with a neat quote: "Your social activities tell who you are". Social Networking are part of everyday life and for many of us a primary way in which we keep in touch with friends and family. Privacy is a huge problem in such networks. Li provided some statistics from a recent survey of British social network users: about 62 percent are concerned about the security of their personal data. 31% of the users falsify information to protect identity. This is a huge figure and definitely shows the level of concern.
This paper was motivated with applications for social advertising (which is a 2.2B US market). One can target users based on location, age gender etc. and social advertising allows advertisers to choose their audience. The problem comes up where users do not wish to disclose too much of their information online. The proposed solution is to automatically infer such missing information.
The authors use an undisclosed dataset for this study. The assumption is users fake personal information but not their activities. It is well known in offline-social networks that gender preference exist in friendship. however in online social networks this is not true (Jure Leskovec, WWW,2008). The key question that authors of this paper ask is : "Users may fake their personal information...But what about social activities?"
The insight that this paper offers is that certain group membership gives hints of user gender information. Joining groups has a gender preference. So the way in which one can infer the gender is to use a bipartite graph of users->groups with missing gender information using relation between users and groups. One approach is to use the User*Group matrix and build a classifier. However, they found that Naive Bayes does not work very well. Many social groups in fact dont have any gender preference. This can really hurt the classifier accuracy. So one approach they propose is to choose the discriminating social groups. i.e. groups that dominated by males or females. One major disadvantage that this technique suffers from is that the membership for users that dont join any of these groups can not be predicted. Once you restrict to the set of discriminating groups, now Naive Bayes performs well.
For users that cant be predicted they propose the use of an iterative algorithm to combine discriminative social groups and results from Naive Bayes classifier. Testing is performed by removing or making some data missing and then predicting the missing gender information.
My concern here is that the authors might have been able to avoid this disadvantage if they had used SVD to map the Groups to a set of lower dimensions that way it would automatically "cluster" the groups based on whatever the discriminating factor is (in their case the gender information). Secondly, while the authors had access to "verifiable" ground truth data, in real world how do we know the influence of fake profiles on these discriminative groups?