Contact Me


  • Akshay Java's Facebook profile

Social Media Events

Friends

Disclaimer

  • Thoughts and comments expressed here are those of the author. Creative Commons License

software

July 15, 2008

Google Calendar Feature Requests

Calendar_sm2_en I have started a Google Calendar to keep track of events and conferences in Social Media. Some of you may have already subscribed to it. However there does not seem to be any way for me to tell exactly how many people are using it!

Feature Request #1: Show Number of Subscribers for a Calendar This would certainly be quite a useful feature to have and as it turns out I am not the first person to request for this. 

Feature Request #2: Allow Tagging; Sharing Calendars a User has Added Calendar as a shared resource for planning and organizing events in a community is an important tool. However, the calendar is still not as social as it can be! You can easily find new calendars to add. But what about tags? How about sharing? I would love to be able to create a tag cloud of events and a public list of calendars I have added.

Feature Request #3: Social Event Notification If a user has made his or her events public, then why not show that users friends an update on the event she plans to attend? I think that Dopplr does this at some level but given that Google Calendar is a good place to consolidate all the events and schedules and GMail is our universal contact list -- why not combine it to make it more social?

Anyways, these are just a few quick thoughts I had about Google Calendar. It is a great tool and has made my life much easier. To be fair, I have never used outlook so perhaps my view may be a bit skewed.

June 27, 2008

Wordle: Create Beautiful Tag Clouds

Briefly; via Harry Chen:

Worldle is a neat tool to create pretty looking tag clouds. Enter text or your del.icio.us username.

Here is my tag cloud from del.icio.us
Tags
looks awesome, no?
Thanks Harry!

Its Getting Crowded Out Here!

FirefoxYep! Thats what my Firefox toolbar looks like these days. Bookmarklets are a convenient way to post stuff to various sites. I use it for sharing links with the ebiquity blog via del.icio.us, post something to friendfeed, check news every now and then and right now I am testing this bookmarking site called Gyzork.  A few other plugins I have installed are del.icio.us, socialbrowse and stumbleupon. I used to have a lot more but couldnt keep up with all the stuff going on in my browser on a single monitor! :-(

Perhaps its time to rethink how we use bookmarklets. Some thoughts on this:

  • It would be great to have icons for bookmarklets
  • I need a bookmarklet 'stack feature' -- a place where I can organize bookmarklets.
  • A listing of all the bookmarklets for different categories (GTD, News, Links, Blogs, etc...) would be handy for power users.
  • Maybe we even need a simple way to post a link on one place and have it pushed to other sites

Its getting crowded out here! I am wondering if there is a better way to manage this somehow?

[Thanks Anthony Vito, for some of the discussions that led to this post]

June 20, 2008

Some things are just Semi-Social

Social Media is a lot about sharing. Prior to the growth of social software, it wasn't that people did not share stuff -- they just did it offline or via email. Now we share at a massive scale and a lot more easily. 

Some things we are willing to share "openly"

  • Music playlists (Last.fm)
  • Books we read (iread, shelfari)
  • Calendars and Travel plans (google calendar)
  • Status updates (via Twitter and Microblogging)
  • Restaurant recommendations (yelp)
  • Knowledge and expertise (via Wikipedia)

As we start to experiment with social software we realize that sharing is good and soon become open to sharing a lot more. There are some things though, that just seem semi-social. What I mean by Semi-Social is roughly "Thing I would not mind sharing with a small group of trusted friends and family members".

Until just a few years back there would have been a lot more people squirming if they were asked to share such 'sensitive data' with others. I see this perception slowly eroding away. There is a small, albeit enthusiastic bunch experimenting with new tools that fall into the category of Semi-Social. 

Some cases that I can think of are as follows:

  • Investment portfolio: One example is Covestor. I have an account there but it is under pseudonym. I would not be that enthusiastic to reveal my pathetic attempt to bet on the stock market by watching (mostly tech) blogs. sigh!
  • TV watching habits: I think Television as we know it today is completely broken. There is no social aspect to it whatsoever. At ICWSM, Noor Ali-Hassan presented a paper on "Social Media Scenarios for Television". What struck me about this talk was her statement that "Despite its social nature, there is a private aspect of TV that people want to preserve".
  • Income and financial information: This is something we had least anticipated. How did we get to a point where I am actually not that scared while putting all my bank details and credit card information into a site like Mint? Mint is not a social site as such. But it reflects how we are now willing to part with some really sensitive data. In contrast, there are other examples of recruitment sites like SimplyHired where people reveal their salary information and can search for companies by salary. A more recent startup that is quite similar is Glassdoor.
  • Location: Location can be an extremely sensitive piece of information. Fortunately, Yahoo's fireeagle provides access control for various applications and one can set the privilege that each app has to access location information (latlong, zip, state, country etc).

There will always be some who are at the extreme end of the spectrum and are quite comfortable with being completely (publicly) transparent about "sensitive data". However, most would still only dare to share some of this data with close friends and select people -- i.e. if there is enough value proposition in it for them. Some would be comfortable with aggregate analysis over the data as long as they are not personally identified or targeted in some way (advertising or otherwise).

Although it requires a great deal of courage (to work with privacy sensitive data), the opportunity to invent in the semi-social space may be quite a bit.

May 30, 2008

My First FireEagle App: Pizza Coupons Search

FireaglePizzaYesterday, I discussed an idea around the FireEagle geolocation API. I was envisioning an app where you could have a mobile phone and as you walk down the Mall or any location, it would pre-fetch relevant coupons and offers from the local restaurants. Being a grad student, we always learn to find good Pizza deals online. So I decided to use the FireEagle API to develop a Pizza  coupon finder. The way it works is that it authenticates with FireEagle to access your current location and then fetches the coupons from Google Maps and then parses the output to display on your mobile phone or a browser. You can try it at the following URL  http://wikimatix.com/coupon/pizza.php
if you have a FireEagle account already. First the application will try to authenticate with FireEagle and request the appropriate permission to access the exact or approximate location information and then passes this to the Google Coupon Finder.

Finally you have all the coupons you need to order your fresh pizza. The Documentation and example walkthrough code on FireEagle's developer area is excellent. It took hardly any time to put together this demo!

I think that the possibilities that this opens up for mobile advertising are exciting. We should also keep an eye on Android -- this space is gonna be fun to watch. [Update: Fixed the broken link. Sorry]

May 29, 2008

Yahoo FireEagle: Geolocation made simple

Fireeage This service is currently in alpha but thanks to Pranam Kolari I was able to get an invitation to Yahoo!'s FireEagle platform. FireEagle is an easy way to manage and share location information across many applications. Currently, I publish my location information across many different sites and applications and it is rare that I put in the actual effort to update it everywhere. For example I use Dopplr to publish my travel plans, twitter and Brightkite to update my current location and Facebook to indicate my home address and other details. I was impressed with how easy it was (using OAuth) to allow Dopplr and others to share and access information with FireEagle. If you have a GPS enabled phone you can even update the geolocation on the go! Damn! Thats is neat!

WikinearMetosphereOne really compelling application is Wikinear.com -- it shows you the nearest places of interest by matching the location information obtained from FireEagle with Wikipedia entries. This is great especially if you are traveling to a new location or a tourist spot and would like to know the places of interest nearby.

Another very cool application is Metosphere. (PS: I wish I had an iPhone!). With this app, you can leave a digital message for a given location, see places and events of interest and even report Graffiti and City Repair! This gives me a reason to believe that the next big thing is going to be mobile advertising. The advantage of easy availability of geolocation information specific to a user is immense. This reminds me of a project at eBiquity research group a few years back, called Agents2go,  that talked about a very similar concept. Imagine that you were walking down the during lunch and the agent on your iPhone would automatically collect coupons or find deals at the nearest restaurants as you walk by. The idea that we can have a query free, geographically relevant search is really exciting. Yahoo! is innovating and pushing hard on the open initiative. With the availability of an API it would be fun to integrate Google Coupons! (OK here is one more fascinating idea and little time at hand!)

Location is a very sensitive piece of information and the best part of FireEagle is that you can manage permissions and privacy settings or even temporarily stop sharing your location. You can allow a specific application to only access location information at a certain granularity: exact, zip, neighborhood, state or even country.  More at Techcrunch.

May 16, 2008

Nonnegative Matrix Factorization

Sometimes you learn about a new mathematical technique that is so intriguing that it can be only described as "beautiful". Nonnegative matrix factorization is one such method that I did not know of until quite recently. The details of the method are available in the paper "Document Clustering Based On Non-negative Matrix Factorization" by Wei Xu, Xin Liu, Yihong Gong.   

The basic idea behind this method is that you want to factorize a matrix X into two smaller matrices U and V such that, both U and V are non negative. This is achieved by using minimizing the following optimization function

Equation So if we have a matrix X that represents a Term*Document matrix: it can be factorized into the two matrices U and V such that U signifies the Term*ClusterAssociation and V transpose signifies the ClusterAssociation*Document matrix. Now since the two matrices U and V are non negative, meaning all the elements in them are >= 0, we can identify the cluster to which a document belongs by projecting the vector V onto the dimension with the highest value.

Classsic3nmfSingular Valued Decomposition(SVD), decomposes X into dense matrices that can contain negative elements and it is not always intuitive what the basis vectors really signify. However using NMF the clusters are readily and directly available from the factorization. In addition, the sparsity makes this technique quite appealing.

In the following example, I have clustered the CLASSIC3 dataset, which is a standard corpus frequently used for evaluating different clustering methods. Notice how the three datasets CISI, MEDLINE and CRANFIELD line up nicely along the three different axis.

I like this method for its simplicity and intuition and have been exploring its use in clustering blog/social data.

May 09, 2008

News feed vs. blog posts vs. email

What is the difference in size distribution of a news wire vs. a blog post vs. email message?

The below three images compare the size distribution of news wires (Reuters collection) , blog posts (from the ICWSM dataset) and email messages (Enron Corpus).  The charts show the histograms of the size of the documents in these collections:

Reuters Blogposts_3 Enron_2

The three distributions above (ignoring documents smaller than 2000 bytes) were fitted using the matlab scripts for powerlaw fits (Thanks to Aaron Cluaset). 

ReuterslawBlogpostlaw Emaillaw_3

The linguistic properties of blogs email and news stories are quite different and this has already been highlighted in several research papers. While the three data sets are quite different in many ways, here I am analyzing just the size distributions. The  important point to note is 

  • News wire stories are quite short
  • Blogs and emails are much longer and have a heavy tail distribution
  • Power law exponents for blog size distribution and email size distribution are quite similar (around 2.7)

So...what does this mean? It is fairly obvious that news wire stories are quite short due to the nature of reporting. Sometimes the initial news story is quickly reported by agencies like Reuters/AP. These are at times brief and to the point to allow readers to get a quick gist of its contents.

In contrast the size of blogs tend to be much larger than news wires. Citizen journalism is full of opinions thoughts and punditry thus bloating the post. This also goes back to my previous analysis of the blog homepage size vs. Web page size. Indeed the contribution of blogs has been reported to be 4-5 times that of edited text (like the news wires).

What I had not expected was the similarity in the slopes for email and blogs. One thing to note however is that here the emails are aggregated across a number of different users. This is an important distinction. While a single user may receive a few hundred emails, they potentially have access to millions of blogs. Recently, industry's top usability expert Jakob Nielsen concluded that readers skim through and read at most 20% of the words on a Webpage. While there are millions of blog posts every day... there is very little time to read them all in detail. The volume of email is limited by a person's social network but for blogs the act of prioritizing what to read is entirely left upon the user. This essentially necessitates the use of Memetrackers and explains the popularity of filtering tools like digg, techmeme etc. By summarizing popular blog posts and providing blurbs for these, such tools essentially act as a  "social news wire service for the blogosphere".

May 08, 2008

"Personal Brand" Monitoring Tools

Dr. Finin pointed to this interesting post on "branding yourself with a blog":

“… Certainly personal branding isn’t a new concept, but the future of personal branding could be in at your fingertips—with a blog. One of the first steps in creating a brand for yourself is to make your blog visible. Post meaningful entries, comment on your industry’s top blogs, or simply gain a regular readership. “Visibility creates opportunities,” says Schawbel, a social media specialist at EMC Corporation. He believes that when you brand yourself, the competition becomes irrelevant. “The goal of personal branding is to be recruited based on your brand, not applying for jobs,” Schawbel says. …”

Many brand monitoring startups are helping big companies keep track of what their (potential) customers have to say about them or their products. While the space of corporate brand monitoring is  fiercely competed, one area that is overlooked is that of "personal branding" tools. Most of us are highly interested in knowing what is said about us online. As the TechCareers blog points out:

“You are the chief marketing officer for the brand called you, but what others say about your brand is more impactful than what you say about yourself,” says Schawbel.

Keeping an eye on what others have to say about you is not always easy. I started thinking about these issues and outlined how I try to keep up with this information. Here is my "Personal Brand Monitoring Toolbox":

  1. Search Engines: The typical way for me to keep tabs on this is by setting up Google alerts for my name, projects, organization (University/workplace) etc. In addition, I frequently perform "ego searches" to forage for mentions of my name.
  2. Statistics and Tools: One very interesting tool that I have found useful is Lijit. It provides you stats on who is searching for you, what keywords were used to reach your blog, etc. In addition I use Google Analytics to know more information about my visitors, most visited pages and time they spent on my site. If you are an academic like me, you would like to know who has cited your papers recently (Google Scholar) and the number of downloads, who has linked to your paper (Google link: search) and/or your blog posts (Technorati searches). Yessss! I admit! I have become a total statoholic! :-)
  3. Comments and Scraps: Twitter is another important tool in our arsenal for personal branding and your replies say something interesting about you. Finally, the comments on my blog, Facebook messages, scraps and photos are all part of my "brand" and I take interest in replying to them just like I would to an email.

As our information spaces diversify, monitoring "your brand" becomes a part of the everyday online activity. I dont think we have exactly cracked the nut yet -- keeping track of your profile and "your brand " is a highly addictive activity and I think that the tool(s) that make it fun and exciting will enjoy a great deal of popularity.

May 02, 2008

Leaveraging Web and Social Media for Recommendations

Both Amazon and Netflix's business models rely on effective recommendation systems. The recommendations provided by such systems are based on the purchasing habits of millions of customers. As such, these systems are non-trivial and have evolved out of years of research in both academia and industry.

In addition to mining millions of customer transaction records, for many products there is a vast amount of information available online. While I do not have a lot of familiarity with recommendation systems literature, it seems obvious that the Web and Social Media is a great source of information that could be useful when building such systems.  Bloggers' profile pages, wishlists, netflix queues, book lists and the blog posts themselves are potential clues to learn which two items may be related to each other.

As a simple example, consider the movie "Pulp Fiction", by querying Google for all the inlinks to the IMDB homepage of Sin City Pulp Fiction and counting which are the other movies that are "co-cited" here is a list of five movies that are most likely to be related to "Pulp Fiction":

Most of these look quite relevant. Some critics have claimed similarities between Pulp Fiction and Snatch. One surprise though was LOTR, I wouldn't have expected it to be grouped with Pulp Fiction, but I guess I like them both very much -- so it seems reasonable in my case atleast.

Just for fun, here is another example with "Sin City" another one of my favorite movies.

Unless you have a large index of the Blogosphere or the Web, it would be quite inefficient to mine for such correlations (by passing queries to search engines) on a large scale. I do not know how much of the search engine information is leveraged in recommendation systems built by Amazon or Netflix.  It might also be worth looking into differences in the recommendations produced on the basis of "how people co-cite two products" vs. "how people purchase two products".

Google Ads

Related Wikipedia Entries

Ads

Recent Readers

Search this blog


  • WWW
    socialmedia.typepad.com

July 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
I Love 6A

Please Support