Contact Me


  • Akshay Java's Facebook profile

Social Media Events

Friends

Disclaimer

  • Thoughts and comments expressed here are those of the author. Creative Commons License

social networks

July 24, 2008

M3SN Workshop on Social Media

Just wanted to share a quick pointer to the First International Workshop on Modeling, Mining and Managing Evolving Social Networks (M3SN) to be co-located with IEEE ICDE 2009 in Shanghai. Following is the CFP via Christian König

I have also added the submission deadlines in the Social Media Calendar (add it to your Google calendar!).

Read this document on Scribd: M3SN Workshop on Social Media

July 20, 2008

Advertising Models: From Contextual to Conceptual/Semantic to Social

Contextual advertising relies on matching an advertisement with a page based on its content. Most often advertisers bid on keywords and the ad platform finds the appropriate pages on which these ads can be displayed by matching keywords and phrases with the content. There have been a number of situations where such an approach may fail. A few real world examples are discussed in the paper by Broder et al. [2]:

a page about a famous golfer named “John Maytag” might trigger an ad for “Maytag dishwashers” since Maytag is a popular brand. Another example could be a page describing the Chevy Tahoe truck (a popular vehicle in US) triggering an ad about “Lake Tahoe vacations”. Polysemy is not the only culprit: there is a (maybe apocryphal) story about a lurid news item about a headless body found in a suitcase triggering an ad for Samsonite luggage! In all these examples the mismatch arises from the fact that the ads are not appropriate for the context.

These examples highlight the need for moving from a contextual to conceptual/semantic advertising models. The paper by Broder suggest mapping the pages as well as ads into a common ontology/taxonomy and thereby finding the appropriate higher level concept (Like politics/sports etc) that relates the two. There are very few papers on advertising models since this is such a closely guarded secret and a search companies' substantial revenue is tied to their ad platform's performance.

I believe that advertising is in it's infancy and more interesting approaches would soon replace the current state of the art. One problem is that due to the lack of datasets it becomes quite difficult to be in academia and make a significant contribution towards this area.

The next avenue for advertising seems to be Social advertising. While Facebook has its own approach to social advertising. I think the general idea of Social Advertising is to utilize not just the context of the page but also the social information to better place the advertisement. One question here is whether the ad placement is done to target the user or his/her audience. These might require slightly different models. For example, if my friends are all clikcing on the iphone ad on a social network, the platform might decide to also target ME personally for the marketing the iPhone. On the other hand if it can identify that a lot of users come to visit my profile due to the social media posts I write -- then perhaps the advertising could target them instead.

One potential market for advertising that I think is completely untapped is the referral. Companies are ready to pay huge sums of money to get new clients. Often cell phone companies, stock trading sites and banks launch promotions where they pay upwards of $50 for referring a friend. But when was the last time you actually did that? I think that is the best way to alienate your friends -- by hoarding corporate America's products and services or spamming their inboxes with unwanted referrals. But still, this is a huge market and worth billions -- if we can crack it! One approach I am thinking of is to build a referral platform (perhaps there are some out there -- I just dont know?) -- one which would benefit publishers and advertisers alike. I as a publisher have a (sort of) general sense of what my audience would like. I can for example even decide that I might be willing to share the $50 I receive from the advertiser and pass on the benefit to my readers (since my payoff is in having the readers come to my blog!) -- thus subsidizing that iPhone you wanted to buy. In the current model, there is'nt much incentive for me to share (a few cents???) /pass on the benefit with the final consumer. But for higher valued products, my guess is that It might just as well work right. Moreover, the referral platform manages the entire process thereby making it easier on the advertiser to launch new schemes and manage their inventory of referral programs.

I had intended to write a brief note on some of the recent papers [1-4] on this topic but turned out sharing my thoughts on the advertising instead -- which is perhaps more fun anyway :-).


[1] http://www.cs.cmu.edu/~deepay/mywww/papers/www08-interaction.pdf
[2] http://portal.acm.org/citation.cfm?id=1277837
[3] www.csulb.edu/web/journals/jecr/issues/20081/Paper1.pdf
[4] http://www2008.org/papers/pp231.html

July 19, 2008

What is the Dunbar's Number for Social Networks?

Many folks are really excited about FriendFeed. Personally, I have found that there are a lot more comments when something gets posted on FriendFeed. Recently Yuval Atzmon's User21 blog released a list of most followed users on FriendFeed. Since I too had a crawl of FriendFeed running in much the same way as Yuval, I decided to look at the complementary question: "How many users do people follow on FriendFeed"? While the crawl is not yet complete (and complete statistics would have to wait), the numbers are really striking! Some users follow more than a 1000 "friends":

sthayden 3190
scobleizer 3087
juliomedina 2760
thomashawk 2557
jasoncalacanis 2447
theillife 2045
mrsth 1961
pookakoo 1814
czarphanguye 1736
brynyoungblut 1716
eposter 1562
susangrisantiguitarist 1550


I find this really amazing. Unlike Twitter, FriendFeed posts are accompanied with longer conversations so it can be more involved. I can barely keep up with all the information flying past me everywhere right now! I guess, 1500+ "friends" would be way too much for me!

Sociologists often talk about the Dunbar's Number which

is the supposed cognitive limit to the number of individuals with whom any one person can maintain stable social relationships.

In human contact network the Dunbar's number is said to be around 150. It might as well be the case that social tools and especially, microblogging is pushing this limit further. Studies on Twitter, Livejournal and other social networking sites seem to support this observation. I wonder then: what would be the Dunbar's number on social networks? 300? 500??? Any guesses? Perhaps some comparison across all the published papers that have studied different social networks might have some clues.

[BTW, I am akshayjava on FriendFeed]

July 17, 2008

Get a Room People!


I've been tinkering with Google's new Virtual World Lively. It seems a bit flaky at the moment and works only on Windows right now... but I was able to set it up. The neat thing about it is that the rooms can be embedded anywhere; like in this blog, for example. Like MyBloglog, Lively can also serve as a visitor log and unlike SecondLife, it isnt a walled garden. It is a fun thing to try out, but IMO, at its utility seems low. What it lacks right now is the stickiness factor...a few of the 2D Facebook app games are more addictive than this. Usually in such situations, an occasional flash mob like the one on Huddle is a lot more fun!

Clustering Triples from Social Data

TagtriplesBy far, the most prevalent data available in social media is tagging information. For example, in del.icio.us a user may tag a URL or in Flickr she may tag an image. One of the questions that comes up is how to then cluster social data that is rich in tags. Some techniques available ignore the user information and use only a bipartite graph consisting of tags and URLs. Another method is to represent two pieces of evidence (user-tag;tag-blog) in a tripartite graph (where nodes are of three different types: users, tags and urls). However, Realtripleseven this type of structure actually  misses the higher order relation between the three nodes. Note that the information available is really in triples of the type <user, tag, url>. This information is not captured by the tripartite graph model. In particular, two users may be connected via a common tag even if the actual URL they bookmarked is vastly different.

There are some techniques using Tensor Matrix Factorization that can handle such data. However, the question of how to deal with triple (or higher) information from social data is quite interesting. Moreover, being able to do so efficiently and in an online fashion would also be important. I believe that this topic may be of significant interest in the upcoming social media and data mining conferences. The implications of these techniques would be in building better recommendation systems and personalization algorithms.

[Thanks Vlad Korolev for some of the discussions related to this post]

July 15, 2008

Google Calendar Feature Requests

Calendar_sm2_en I have started a Google Calendar to keep track of events and conferences in Social Media. Some of you may have already subscribed to it. However there does not seem to be any way for me to tell exactly how many people are using it!

Feature Request #1: Show Number of Subscribers for a Calendar This would certainly be quite a useful feature to have and as it turns out I am not the first person to request for this. 

Feature Request #2: Allow Tagging; Sharing Calendars a User has Added Calendar as a shared resource for planning and organizing events in a community is an important tool. However, the calendar is still not as social as it can be! You can easily find new calendars to add. But what about tags? How about sharing? I would love to be able to create a tag cloud of events and a public list of calendars I have added.

Feature Request #3: Social Event Notification If a user has made his or her events public, then why not show that users friends an update on the event she plans to attend? I think that Dopplr does this at some level but given that Google Calendar is a good place to consolidate all the events and schedules and GMail is our universal contact list -- why not combine it to make it more social?

Anyways, these are just a few quick thoughts I had about Google Calendar. It is a great tool and has made my life much easier. To be fair, I have never used outlook so perhaps my view may be a bit skewed.

July 06, 2008

The Cold Start Problem in Social Media

The Cold Start Problem in Social Media

Here is a classic cold start problem:

  • "Social Tools" can only be social if there are enough people on it.
  • And any social site is only as attractive as the number of friends you have on it.
  • The social site is only useful if there are enough people contributing to it. (be it annotating images, links or adding reviews)

The question is how do you get enough users to adopt a tool and build sufficient traction around it such that it attracts more users? I dont have all the answers and perhaps entrepreneurs and folks in startups are more knowledgeable about this than I am. But this is a question I have been pondering about for some time. Most of these points might be fairly obvious, but here are a few thoughts I'd like to share:

  • Above all, build something cool!
  • Realize that in many systems, 1% users acting as contributors is all it takes! (see Clay Shirky's book "Here Comes Everybody" for more on this). Ensure that the reward mechanism is automatically built into the site. These 1% of all users are not the ones that are motivated by money. They use your tools because they enjoy it or it solves some real problem they have been facing. I have seen some sites trying to "pay" users to add data to "seed" their site. For example, check out some of the high paying HITS on Mechanical Turk. In my opinion, this is like throwing money out of the window. Completely bogus way to jump start your site!
  • Provide APIs: One of the key factors that contributed to the success of Twitter was that they had a neat API that developers immediately adopted and had fun building cool toys. These 3rd party tools in turn make it easy for users to contribute and engage with your site, thus breaking the cold start problem. For example, even though not developed by Twitter, the plethora of third party twitter client make it easy to easily update your Tweets. 
  • Try to "seed" your site with datasets curated by web crawls, APIs, external databases or using the tools yourself. For example, if you are building a site that uses geotagging, you might consider using sites like geonames.org or if your site is around movies -- use IMDB or Amazon data to seed it.
  • Make sure stuff is findable and socially visible. Make it easy for your users to find the data they really care about and they will be willing to annotate. Moreover, make sure that it is easy for users to share what they annotate with their friends. The beauty of Facebook is the news feeds. People like to know what their friends are upto. On Twitter, I want to know what my friends are saying and be able to have conversations with them - without that twitter is just a chat room.
  • Dont ever SPAM! Every week I have a bunch of emails coming from some random sites that a friend of mine once joined. Make sure that in your invitation email you include an option to not receive any requests from your site in the future. I am amazed when I dont see this option at all. If a user does not wish to join a network, please dont keep sending them emails requesting them to join every time one of their friends sign up! Also, I am a sucker for alpha/beta testing for any new social media/social network site. Sometimes, I try even the ones that eventually land up spamming everyone on your email/IM. As a RULE -- never spam your potential users! That is the best way to piss them off even before they join it.
  • Listen and iterate rapidly. Your alpha/beta users are the most important. Listen to what they have to say. Also, if you cannot convince your friends and family to use the tool -- why would anyone else bother?

The cold start problem has been studied in computer science, particularly for recommendation systems**. A good place to start is the paper:

Methods and metrics for cold-start recommendations  Schein, r.I.; Popescul, A.; Ungar, L.H.; Pennock, D.M. [Link]

I am quite interested in knowing how startups have approached this problem in real situations and particularly, if there is any analytical data available to show what worked and what did not? I guess this might be information that few would be willing to share so openly.

** on a related note: The blog "Duke Listens" is an excellent source for more on recommendation systems. Also check out the recent post on cold start problem.

July 02, 2008

Community Detection via Matrix Factorization

Communities One form of matrix factorization is Singular Valued Decomposition (SVD). This is a powerful technique and it has several applications in information retrieval and graph analysis.

Another matrix factorization technique I had mentioned recently was Non Negative Matrix Factorization (NNMF). The advantage of NNMF over SVD is that it is easier to compute and is generally much easier to interpret due to the strict positivity constraint.

Matrix factorization can be achieved via optimization methods. Suppose a matrix A (shown in the figure on the left) of size 20*20 was to be factorized into two matrices X of size 20*4 and W' of size 4*20, the following objective function can be minimized:

J = || A - XW'||_f 

The cost function minimizes the Frobenius norm between the original matrix A an XW', i.e. the error in approximating A as a product of two matrices. This can be solved using conjugate gradient methods and MATLAB's optimization toolbox (fminunc; tutorial) is one way to implement this. Following is the MATLAB code as an example:

test = ones(5,5);
B = blkdiag(test,test,test,test);
M = rand(40,4);

[xnew,fval] = fminunc(@obj_fun1,M,options,B,20);

function [fun,Grad] = obj_fun1(Z,A,nodes)
    [m,n] = size(Z);
   
    X = Z(1:nodes,:);
    W = Z(nodes+1:end,:);
   
   % Objective Function
   fun = norm(A-X*W','fro')^2+norm(W,'fro')^2;
  
    if nargout > 1  
      Grad1 = 2*(X*(W'*W)-A*W); 
      Grad2 = 2*(W*(X'*X)-A'*X)+W;
      Grad = [Grad1; Grad2];
    end


Once we minimize the objective function, we can obtain the solution for X as
  82.5664   -1.1484   79.4176  -39.0137
   82.5664   -1.1482   79.4176  -39.0137
   82.5666   -1.1485   79.4176  -39.0139
   82.5666   -1.1485   79.4176  -39.0139
   82.5667   -1.1472   79.4173  -39.0141
   -6.3391  -18.4040   68.2625   88.6399
   -6.3389  -18.4039   68.2623   88.6397
   -6.3389  -18.4039   68.2624   88.6398
   -6.3388  -18.4037   68.2622   88.6396
   -6.3386  -18.4036   68.2621   88.6395
   75.9984   13.2685  -57.4890   70.8761
   75.9989   13.2680  -57.4891   70.8759
   75.9985   13.2681  -57.4889   70.8761
   75.9985   13.2687  -57.4891   70.8760
   75.9989   13.2681  -57.4890   70.8758
  -17.6262  112.9716   27.4847   14.5483
  -17.6257  112.9713   27.4844   14.5482
  -17.6263  112.9716   27.4847   14.5483
  -17.6259  112.9715   27.4844   14.5484
  -17.6255  112.9708   27.4844   14.5482

From this essentially the community structure can be easily determined (observe the rows can be grouped to reflect the original communities). However, a much faster and efficient (in terms of implementation) way to accomplish this goal is using something like Singular Valued Decomposition (SVD).

The above code is just a simple illustrative example. However, for me it was a worthwhile experiment to try out and to understand how matrix factorization via optimization can be useful in community detection.

Some recent, interesting papers that use different Matrix Factorizations:

I would appreciate if anyone has pointers to other interesting references/tutorials/software and could please leave me a comment.

July 01, 2008

Gmail as a Universal Contact List

Stop UPDATE2: Great News! Right while I had been thinking about this issue yesterday, looks like Google released its official AJAX Client library for its Contact API. So after all Google is becoming a universal contact list? Now, with these tools and APIs available, third party sites have no excuse whatsoever for continuing to insist on asking for username and password to import contacts!

UPDATE1: As it turns out, I totally forgot about the recent announcement of Google Friend Connect and the controversy that soon followed. This is the kind of approach I was thinking of just that it slipped my mind while writing this post late into the night. I think I had signed up for the private beta as well and am awaiting an invitation. Here is the video that explains Google Friend Connect.

I hope with Google, Facebook and Myspace all trying to solve this problem,  third party apps trying to import contact list using password/credentials directly will soon be a thing of the past.

-----------------------------------------------------------------------------

Gmail has almost become a universal contact list. Atleast all social network sites think it is so..

I just dont understand why every time I am asked for my gmail user ID and password (to find friends on a network) I cringe but then finally give in -- only to get burnt, burnt and burnt (ouch!) What drives me nuts is when some of these sites get away with sending your password in plain text! Why do we put up with this nonsense, in this day and age?

One suggestion I have for this problem is to build a Gmail Friend Finder API that would allow Yet Another Social Network (YASN) to access our universal contact list. What I mean by this is: Gmail knows everyone I know and interact with. I trust Gmail and am generally more willing to let Gmail be the arbiter of my social information. Why is this a good idea? for starters third party apps neednt ask users for their password. I just ask them to go and talk to gmail to see if there are others in their site whom I might know and might be interested in connecting with me.

Yesss! I am aware of OpenID and Social Graph API. Here is a small glitch, though. Social Graph API relies on FOAF/XFN and not everyone has that information published online. OpenID is more for authentication and IMHO, its kinda unintuitive and difficult to explain even to tech savvy folks -- let alone my grandmother! Gmail on the other hand... everyone has an account there and we all 'get it'! To be fair here.. Microsoft passport account in some sense was a precursor to all this, perhaps even a little too early for its time!

Following is an illustrative example of how I see this working:

The approach that I think might work better would involve developing an API for Gmail. When I first join YASN, instead of sending me an email directly, it outsources the verification process to the Gmail API. Gmail sends me an email to verify that it was actually me who signed up on YASN. Once I confirm, it sends YASN a confirmation that it has verified it is me. In addition it sends a secret identifier that it requres YASN to send over SSL when asking for any of my data. Note that at this point Gmail already knows for certain that I am a member on YASN. Now, I want to check if any of my friends are on YASN. So YASN will connect once again with Gmail friend finder along with the token/secret code that was sent to it when I completed the email verification. Now the only friends that Gmail API sends to YASN are the ones who are connected to me on Gmail AND are also members of YASN.

Since YASN can only access limited information via the Friend Finder API, it cannot spam everyone on my email account. Additionally, since it does not have my password, it minimizes the risks of my account being hacked or YASN doing something malicious. Ofcourse all this is just conceptual -- unless Google/Gmail team actually implements some such API.

[Thanks Audumbar Chormale, for the discussions and the question that led to this post]

June 20, 2008

Some things are just Semi-Social

Social Media is a lot about sharing. Prior to the growth of social software, it wasn't that people did not share stuff -- they just did it offline or via email. Now we share at a massive scale and a lot more easily. 

Some things we are willing to share "openly"

  • Music playlists (Last.fm)
  • Books we read (iread, shelfari)
  • Calendars and Travel plans (google calendar)
  • Status updates (via Twitter and Microblogging)
  • Restaurant recommendations (yelp)
  • Knowledge and expertise (via Wikipedia)

As we start to experiment with social software we realize that sharing is good and soon become open to sharing a lot more. There are some things though, that just seem semi-social. What I mean by Semi-Social is roughly "Thing I would not mind sharing with a small group of trusted friends and family members".

Until just a few years back there would have been a lot more people squirming if they were asked to share such 'sensitive data' with others. I see this perception slowly eroding away. There is a small, albeit enthusiastic bunch experimenting with new tools that fall into the category of Semi-Social. 

Some cases that I can think of are as follows:

  • Investment portfolio: One example is Covestor. I have an account there but it is under pseudonym. I would not be that enthusiastic to reveal my pathetic attempt to bet on the stock market by watching (mostly tech) blogs. sigh!
  • TV watching habits: I think Television as we know it today is completely broken. There is no social aspect to it whatsoever. At ICWSM, Noor Ali-Hassan presented a paper on "Social Media Scenarios for Television". What struck me about this talk was her statement that "Despite its social nature, there is a private aspect of TV that people want to preserve".
  • Income and financial information: This is something we had least anticipated. How did we get to a point where I am actually not that scared while putting all my bank details and credit card information into a site like Mint? Mint is not a social site as such. But it reflects how we are now willing to part with some really sensitive data. In contrast, there are other examples of recruitment sites like SimplyHired where people reveal their salary information and can search for companies by salary. A more recent startup that is quite similar is Glassdoor.
  • Location: Location can be an extremely sensitive piece of information. Fortunately, Yahoo's fireeagle provides access control for various applications and one can set the privilege that each app has to access location information (latlong, zip, state, country etc).

There will always be some who are at the extreme end of the spectrum and are quite comfortable with being completely (publicly) transparent about "sensitive data". However, most would still only dare to share some of this data with close friends and select people -- i.e. if there is enough value proposition in it for them. Some would be comfortable with aggregate analysis over the data as long as they are not personally identified or targeted in some way (advertising or otherwise).

Although it requires a great deal of courage (to work with privacy sensitive data), the opportunity to invent in the semi-social space may be quite a bit.

Google Ads

Related Wikipedia Entries

Ads

Recent Readers

Search this blog


  • WWW
    socialmedia.typepad.com

July 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
I Love 6A

Please Support