Yesterday, I discussed an idea around the FireEagle geolocation API. I was envisioning an app where you could have a mobile phone and as you walk down the Mall or any location, it would pre-fetch relevant coupons and offers from the local restaurants. Being a grad student, we always learn to find good Pizza deals online. So I decided to use the FireEagle API to develop a Pizza coupon finder. The way it works is that it authenticates with FireEagle to access your current location and then fetches the coupons from Google Maps and then parses the output to display on your mobile phone or a browser. You can try it at the following URL http://wikimatix.com/coupon/pizza.php if you have a FireEagle account already. First the application will try to authenticate with FireEagle and request the appropriate permission to access the exact or approximate location information and then passes this to the Google Coupon Finder.
Finally you have all the coupons you need to order your fresh pizza. The Documentation and example walkthrough code on FireEagle's developer area is excellent. It took hardly any time to put together this demo!
I think that the possibilities that this opens up for mobile advertising are exciting. We should also keep an eye on Android -- this space is gonna be fun to watch. [Update: Fixed the broken link. Sorry]
This service is currently in alpha but thanks to Pranam Kolari I was able to get an invitation to Yahoo!'s FireEagle platform. FireEagle is an easy way to manage and share location information across many applications. Currently, I publish my location information across many different sites and applications and it is rare that I put in the actual effort to update it everywhere. For example I use Dopplr to publish my travel plans, twitter and Brightkite to update my current location and Facebook to indicate my home address and other details. I was impressed with how easy it was (using OAuth) to allow Dopplr and others to share and access information with FireEagle. If you have a GPS enabled phone you can even update the geolocation on the go! Damn! Thats is neat!
One really compelling application is Wikinear.com -- it shows you the nearest places of interest by matching the location information obtained from FireEagle with Wikipedia entries. This is great especially if you are traveling to a new location or a tourist spot and would like to know the places of interest nearby.
Another very cool application is Metosphere. (PS: I wish I had an iPhone!). With this app, you can leave a digital message for a given location, see places and events of interest and even report Graffiti and City Repair! This gives me a reason to believe that the next big thing is going to be mobile advertising. The advantage of easy availability of geolocation information specific to a user is immense. This reminds me of a project at eBiquity research group a few years back, called Agents2go, that talked about a very similar concept. Imagine that you were walking down the during lunch and the agent on your iPhone would automatically collect coupons or find deals at the nearest restaurants as you walk by. The idea that we can have a query free, geographically relevant search is really exciting. Yahoo! is innovating and pushing hard on the open initiative. With the availability of an API it would be fun to integrate Google Coupons! (OK here is one more fascinating idea and little time at hand!) Location is a very sensitive piece of information and the best part of FireEagle is that you can manage permissions and privacy settings or even temporarily stop sharing your location. You can allow a specific application to only access location information at a certain granularity: exact, zip, neighborhood, state or even country. More at Techcrunch.
Following are some of the books that I highly recommend for anyone interested in the science behind Social Networks research. I like to call this set of books as the "A Trilogy of Social Network Research in Four Parts".
LinkedDr. Albert-László Barabásiis a pioneer in social networks research. The concepts of preferential attachment and scale-free networks were first proposed by Dr. Barabási. This has led to our understanding of how human communication works, fault tolerance in real-world networks and discovery of several algorithms that describe the growth of networks, community formation. Linked is a story of a researchers quest for answers to complex phenomena from the spread of viruses to behavior of hubs. Both Linked and Sync are books that teach us how the simplest explanation is usually the best. Six DegreesDr. Duncan Watts (Ph.D. student of Dr. Strogatz) presents an excellent look into the recent discoveries in network theory. The book is a tribute to all the academic work that went behind the discovery of small world phenomena, scale free networks and the theory behind search in such complex networks. I particularly enjoyed the book because being in school and working towards a Ph.D., I can really relate to the author's narration of the trials, tribulations and all excitement (yes!) of grad school.
Sync written by Dr. Steven Strogatz, this book was rated as the best of 2003 by Discover magazine. This book talks about how synchrony emerges from a seemingly random and chaotic nature of universe and nature. Its a true science thriller that touches upon complex topics with ease and finesse. It is an inspiring book that truly reflects the passion of someone who is excited about his work, research. Dr. Strogatz has the ability to engage even someone who may have a very little understanding of the subject and describe complex theories in really simple terms.
Nexus This was an interesting read that complemented the Six Degrees and Sync quite nicely. Dr. Marc Buchanan talks about how networks that seem random are actually quite closely linked. The book is a journey from the early days of social networks research and Milgrams experiment of "six degrees of separation" to the most recent discoveries in Physics, Biology and Computer Science that deal with network theory.
Of these I am currently reading Sync. I read both Linked and Six Degrees simultaneously and really enjoyed how the two books complemented each other and show how two scientists approach the same problem in very different, and equally exciting ways. [Update] This post should have really been titled "A Researchers Guide to Social Networks: A trilogy in five parts" with the inclusion of "Tipping point". However, despite being a great book, I felt that Tipping point was not as scientifically in-depth and hence decided to leave it at "A trilogy in four parts". But feel free to include tipping point in this reading list since it is a book that highlights some important ideas and in many ways has made the subject appealing to a vast audience.
Oh BTW, Trilogy here was a reference to the three main underlying themes in these books: scale-free networks, small world phenomena and emergence/Synchronization in such systems.
ICWSM was a great hit! And now there are a growing number of conferences (WWW 08 social networks and Web 2.0) and workshops (DEBSM) for social media research. This is fantastic for the community as a whole: as more people are excited about working in this area, we can bet there will be some significant advances in research and improve our understanding of online communities and social media content. What I particularly love is the fact that Social Media research is an exciting interplay of computer science, social science, psychology and other related fields.
I decided to maintain a list of upcoming conference deadlines and venues for social media research. With a little help from the community, I will try and keep this list up-to-date and accurate. Here is an initial list of upcoming venues for the next few months, while I gather and organize all the deadlines in Google Calendar or something.
Please comment below or email me if you know of any other venues and I shall make sure to add them to this list. Hope to see you at the next conference/workshop!
Check this out! It is a really cool video -- a two minute clip of 10 really cool optical illusions.
It is apparently sponsored by Samsung to promote their new soul phone model. Perhaps, they are trying to suggest "look at the kind of cool things you can do if you could take videos with our phone".
In any case it is a really neat idea and definitely getting a lot ofattention on the Blogosphere. If this is the new approach to advertising then I think it is something we will see more of. User generated advertising trend is on the rise. Check out this other crazy/funky video that could just as well be a levis ad
One thing is for sure - the Samsung video did grab by eyeballs and it was entertaining enough that I watched it a couple of times! Whoever said ads had to be boring 15 second "hooks" (avg length/increments in which TV advertising is sold) ?
Rediscovering the Passion, Beauty, Joy, and Awe: Making Computing Fun Again
The following post is a summary of the talk by Dr. Eric Roberts given at UMBC on April 24, 2007 2008.
Unfortunately, I did not have time to edit the notes and post it sooner. I strongly feel that this is an important subject and academia, industry and schools need to work together on this issue. I think many readers of this blog are equally passionate about this topic and would be interested in Dr. Roberts talk. The PPT of the talk is available here.
Here is my attempt to highlight some key points:
Dr. Roberts addresses how we can make undergraduate education fun ... Again. ("In case it is not already" -- Marie DesJardins). During his role as the chair of the ACM education board, Dr. Roberts has been greatly concerned about this topic.
Int his talk he mentions that there is a paradox in this field: computing industry offers the best employment opportunities, salaries yet the enrollments have gone down -- almost by 50%. And in some schools it is probably as high as an 80% decline. The statistics are alarming more so in women and minorities. If you compare Computer science with fields like Biology, where there is an overproduction of degrees you find that on an average one must do two postdocs before applying for a tenure track position! In Computer Science -- the demand is increasing but the supply is really low resulting in 2/3 people being hired from outside (not outsourced but people hired from non-CS backgrounds).
Dr. Roberts addresses some reasons for this paradox:
1) Fears about long term economic stability of employment This is primarily due to the perception of lack of jobs and fear of outsourcing. However, looking at data about offshoring -- there are more jobs despite outsourcing. Number of computer programmers hired are going down but computer scientists and software engineers is up.
Dr. Roberts suggests an interesting thought experiments: Say you are a company that has the option of hiring $200k/year engineer in USA / a $75k/year engineer in Bangalore... both very talented and have the potential to generate 1M -- what do you do?
Most people would think that the reasoning would lead to hiring just the 75K/year engineer in Bangalore however, if you wear the HR hat you would try hire them both!!! Since they both have the potential to generate the 1M /year -- which is profitable.
2) kind of exposure to computing at elementary and secondary school levels AP test in computer science use to be based on pascal in '94, CPP in 95 and JAVA in 2003. The problem is that it is difficult to find good teachers for advanced programming languages. Additionally, computing skills are becoming harder to teach and teachers dont have the resources to keep up. Moreover the education board views computing courses as "Vocational" almost at par with shop courses. Finally schools are evaluated in terms of math and science performance alone. So good computer science teachers are being moved to teach math/science classes. Finally the schools only care about teaching kids toools!! Like...powerpoint!! WTH?
3) image of work in the field -- no longer seems fun?
"Has anyone considered the possibility that it is not fun anymore?" -- Donald Knuth Oct 11, 2006
What are the reasons? Is it because there is no chance to make a legacy? Has programming is becoming harder? Or is it because startups are too fragile? Dr. Roberts points out that people would rather be Dilberts boss than Dilbert...perhaps this explains why economics and management branches are most popular.
4) University curriculum is broken somehow However, there is a variation in enrolment by time and in some cases enrollment seems independent of curriculum. Student decide not to take before they even look at the curriculum! And those who do land up taking the classes love the courses... but decide not to major in cs anyway. This is quite alarming.
Here are a few of Dr. Robert's suggestions on what we can do:
Realize that the problem is beyond just the university level
Press govt and industry to improve computing education
Emphasize the fact that programming is a key to the field
Restore passion...and make computing Fun!
From my own perspective, this problem is not just here, in the US educational system but seems to be a greater worldwide issue. For example, in India, a small Montessori school that my mom helps to run, is trying innovative methods to involve children in computer education from early on. However, the lack of resources, good computer programming teachers has been a great challenge for them. Additionally, when it comes to primary level education, there arent may educational tools that introduce children to computing. Kids are extremely fast with learning powerpoint, getting online and playing games and it would be great to build software that could teach children basics of programming in a fun and exciting way. These tools need to have a way in which children can take something "tangible" back home to show to their parents and give them a sense of actually building something cool!
Sometimes you learn about a new mathematical technique that is so intriguing that it can be only described as "beautiful". Nonnegative matrix factorization is one such method that I did not know of until quite recently. The details of the method are available in the paper "Document Clustering Based On Non-negative Matrix Factorization" by Wei Xu, Xin Liu, Yihong Gong.
The basic idea behind this method is that you want to factorize a matrix X into two smaller matrices U and V such that, both U and V are non negative. This is achieved by using minimizing the following optimization function
So if we have a matrix X that represents a Term*Document matrix: it can be factorized into the two matrices U and V such that U signifies the Term*ClusterAssociation and V transpose signifies the ClusterAssociation*Document matrix. Now since the two matrices U and V are non negative, meaning all the elements in them are >= 0, we can identify the cluster to which a document belongs by projecting the vector V onto the dimension with the highest value.
Singular Valued Decomposition(SVD), decomposes X into dense matrices that can contain negative elements and it is not always intuitive what the basis vectors really signify. However using NMF the clusters are readily and directly available from the factorization. In addition, the sparsity makes this technique quite appealing.
In the following example, I have clustered the CLASSIC3 dataset, which is a standard corpus frequently used for evaluating different clustering methods. Notice how the three datasets CISI, MEDLINE and CRANFIELD line up nicely along the three different axis.
I like this method for its simplicity and intuition and have been exploring its use in clustering blog/social data.
Just wanted to share a quick note to the KQED/NPR radio talk on "The Psychology of Social Networks" (via Meghavini Shah, Thanks for the pointer!)
Radio host par excellence, Michael Krasny talks to
B.J.
Fogg, director of the Persuasive Technology Laboratory at Stanford
University and the author of an upcoming book on the psychology of
Facebook
Sam Gosling, assistant professor of psychology at the University of Texas at Austin
They cover a wide range of topics and discuss the how social networks are changing the way we interact with each other. It is a really good show and I would highly recommend listening to it if you can.
Over the past few weeks, I have learned of many interesting anecdotal evidences about our online and offline behaviors and how social networks have become such an important part of the equation. I thought I would share it in the context of this talk. Here are a few noteworthy examples:
Teenagers in India are socially quite comfortable expressing their relationship status on Facebook/myspace/orkut -- but would not reveal this information to their parents or family.
Social networks have provided a socially acceptable setting for "checking out" profiles. Arranged marriages in India are still fairly common and it is not unusual for people to check out the profile, scraps and testimony pages of prospective partners before actually meeting them in person. I guess the same is true for dating in general, people judge you by not just who you are and how you look but also who your friends are (and I guess even how they look) and what they have to say about you.
Coaches usually "friend" athletes on Facebook so that they can keep tabs on any parties that students have been going to and to check if they have a "red cup" in their hand (indicating that they have been consuming alcohol). Cell phone cameras are the easiest way for such information to leak onto Facebook. So parties these days have a "NO CELL PHONE" policy.
Dont assume that your school teachers or professors dont know what Facebook is! Students found cheating on exams have been completely baffled to see that their profs actually checked their FB profiles to know if the students are friends -- despite their claims of innocence and that they dont know each other.
Finally, at SocialDevCamp one really cool trend was that people were exchanging their Twitter ids more frequently than business cards. I am still enjoying the conversations that this community of users is having on Twitter. What would have been a one-off meeting is not a sustained community thanks to the power of social networks.
Footnote: Please consider supporting KQED or your local public broadcasting station, who bring to you such excellent programming.
SocialDevCamp totally rokced! The event was best described as:
SocialDevCamp East is the Unconference for Thought Leaders of the Future Social Web
Where is the social web going? It's going mobile, to geocentric
services, and to open platforms. Join a community of like minded
developers, social media gurus and thought leaders for an unconference
to discuss the future of the social web.
Here is my trip report from this event:
Innovation in Social Media if being fueled by brilliant people who are running some really successful startups.
The "Amtrak Corridor" has a ton of talent. There were some really amazing people I met at this event -- some who came down from NY, Philadelphia and even Boston. There was a strong sense of community and entrepreneurial brotherhood, if you will.
The startup scene on the East coast is quite different from that on the West. There is very little VC funding here since most VCs in this area are super conservative. In fact from the audience members who attended the "Who Needs VCs" session perhaps only two companies had taken funding.
Location! Location! Location! Dave Troy gave a great talk on geolocation and his vision for the openlocation initiative. I think we will see a number of startups in this space and it will be exciting to see how devices like iPhone and others change how we find location relevant information.
"Semantic Web" is no longer just a vision. The session "Social Media and Semantic Web" proposed and presented by our very own eBiquity alum Dr. Harry Chen was one of the most attended session. It says something BIG is about to happen when a bunch of really smart entrepreneurs are interested in Semantic Web. Bear from Seesmic shared his thoughts and talked about how they were using Semantic Web technologies.
Amazon EC2 and S3 offer a fantastic alternative to startups. It is the best way to scale up your product and the benefits of using EC2 outweigh the costs.
Twitter is the new business card. I have said this before, business cards are a very short lifespan. SocialDevCampers instead preferred to exchange their twitter handles and add folks by checking the #socialdevcamp interest on twitter. And I think we my have finallyconvinced Harry about the utility of Twitter!
Twitter was the best backchannel at the conference. In all the sessions, we talked about some great sites and shared resources we all knew collectively. The Twitter streams recorded all the highlights of each session for posterity. Even if you did not attend a session now you know a few things that were discussed there.
Lots of iPhones. Photos, Videos, Twitter messages were flying everywhere. There was even a session on the iPhone development (unfortunately I missed that one; it was tough deciding which session to attend when all seemed so good).
Techies know how to party hard! Thanks to the After Party sponsors, geektalk ruled at the local bar Brewers Art. This is a fantastic local microbrewery that serves 7% 10% alcohol beers. :-) Beer+ Free Food + Techies = one hell of a party!
Those who did not make it really missed out on a great event. But fret no more. Soon there will be an announcement for the SocialDevCamp being held in the Fall -- no excuses that time!
A special thanks to the organizers and sponsors for supporting this event and bringing us all together. (Blogscoveringthisevent)
What is the difference in size distribution of a news wire vs. a blog post vs. email message?
The below three images compare the size distribution of news wires (Reuters collection) , blog posts (from the ICWSM dataset) and email messages (Enron Corpus). The charts show the histograms of the size of the documents in these collections:
The three distributions above (ignoring documents smaller than 2000 bytes) were fitted using the matlab scripts for powerlaw fits (Thanks to Aaron Cluaset).
The linguistic properties of blogs email and news stories are quite
different and this has already been highlighted in several research papers. While the three data sets are quite different in many ways, here I am analyzing just the size distributions. The important point to note is
News wire stories are quite short
Blogs and emails are much longer and have a heavy tail distribution
Power law exponents for blog size distribution and email size distribution are quite similar (around 2.7)
So...what does this mean? It is fairly obvious that news wire stories are quite short due to the nature of reporting. Sometimes the initial news story is quickly reported by agencies like Reuters/AP. These are at times brief and to the point to allow readers to get a quick gist of its contents.
In contrast the size of blogs tend to be much larger than news wires. Citizen journalism is full of opinions thoughts and punditry thus bloating the post. This also goes back to my previous analysis of the blog homepage size vs. Web page size. Indeed the contribution of blogs has been reported to be 4-5 times that of edited text (like the news wires).
What I had not expected was the similarity in the slopes for email and blogs. One thing to note however is that here the emails are aggregated across a number of different users. This is an important distinction. While a single user may receive a few hundred emails, they potentially have access to millions of blogs. Recently, industry's top usability expert Jakob Nielsen concluded that readers skim through and read at most 20% of the words on a Webpage. While there are millions of blog posts every day... there is very little time to read them all in detail. The volume of email is limited by a person's social network but for blogs the act of prioritizing what to read is entirely left upon the user. This essentially necessitates the use of Memetrackers and explains the popularity of filtering tools like digg, techmeme etc. By summarizing popular blog posts and providing blurbs for these, such tools essentially act as a "social news wire service for the blogosphere".