Social Media has changed how we consume and search for information on the Web. All major search engines now index content from feeds and blogs. Traditionally, all search engine had to do was find the relevant documents and rank the matches (often based on relevance of the page w.r.t the query and authority of the site). This was fine when content on the Web was "static" for most part. So how has the presence of Social Media content changed the notion of "relevance" in today's search engines? I think that there are four things that need to be considered when ranking Social Media search results:
- Relevance: Clearly this is the most important criteria to be optimized. There is a good deal of research including many years of TREC to understand and improve search quality. It is a challenge to get a good tradeoff between precision and recall in any IR system. With Social Media content this tradeoff may additionally depend on when one is searching for a given query. Consider searching for "Obama election results" -- clearly, if this is being searched for during a primary election season (like today), what the user is looking for is a variety of opinions and might be willing to (to an extent) compromise on the precision in favor of a higher recall. Whereas if the same query was to be performed in 2009, it is precision that would matter most.
- Authority: PageRank, HITS and many other techniques measure a site's authority in the context of the entire graph. Traditionally, even authority is seen as a static measure of a page's importance, given all the links accumulated during the existence of the site. Is this really true in Social Media? Are all blogs equally authoritative -- always? It depends! With plenty of blogs being active and new blogs joining and leaving networks all the time, I think that the authority of a node may also change over time.
- Popularity: The "slashdot effect" (or is it the Techmeme effect now?) can make an interesting post or meme rapidly spread across the Blogosphere and causes it to suddenly rise in popularity. Popularity can be fleeting - something interesting, funny or just bizarre can gain a lot of links. These links accumulated in short bursts of time -- should not count as much as links (perhaps even fewer) that a blog received due to a sustained interest by a number of users over longer periods of times. For most part, graph analysis tools and algorithms tend to ignore this effect when ranking the importance of a node.
- Recency: Blog search engines provide ranked, relevant documents often sorted by recency. To me the interesting question is whether traditional search would also go this route and start offering ways to find recent documents? Given the proliferation of social media content in search results, this may soon be the case.
IMHO, The search engine that seems to do a good job at giving users final control over this granularity is Ask's blog search. It has a fantastic feature that lets you search based on "Relevance, Recency and Popularity". As we see a rapid increase in the portion of relevant blog and feed data presented regular search results, I guess Web search engines would find it compelling to rethink the traditional view of how they present search results.