I love del.icio.us! Its a superb resource to easily share bookmarks. One neat thing about del.icio.us is that the quality of data they have is just fantastic. For most of the popular tags we can find a bunch of very high quality sites that are relevant to the tag/topic. No wonder Yahoo! has been so keen on integrating del.icio.us data into it's search. But I just wish they did more with del.icio.us.
For quite some time now I wanted to build a mashup using the social tagging information from del.icio.us and combine it with Google's Custom Search engine. This is a neat way to build high quality vertical search engines. (For example we could use all the URLs under the tag "astronomy" to build an astronomy vertical).
The closest tool I have found to what I describe here is Swicki. It is a really neat widget to build customized search engines. An awesome feature in Swicki is that I just need to list a few URLs and it learns the related URLs automatically.
Once the vertical search engines is built, Swicki lets you search for an appropriate vertical. One thing I think might be interesting is to recommend and combine related verticals. For example, for the vertical search engine "Astronomy" the related verticals could be "Physics", "Cosmology", "Thermodynamics", etc. We could go even further by combining verticals. I could run my query on a combined vertical using "Astronomy" and "Biology", to search for topics related to Astrobiology, for example.
Another early tool was Rollyo, which allowed users to build custom search engines; I believe, it got overshadowed by Google Custom Search.
I would love to build something like this for del.icio.us. Perhaps, a mashup using del.icio.us and Yahoo API or Google Custom Search. If only, I did not have to cheat and sneak upon del.icio.us servers to crawl their data to do this! Currently, their API is limited and does not offer a simple way to get the list of highly ranked URLs for a tag. Additionally, they are pretty quick with blocking any bots. I have been gathering some data from other sites like Yahoo myWeb 2.0, Furl etc. but these are often highly spammed and frankly not as high quality as del.icio.us.
"For most of the popular tags we can find a bunch of very high quality sites that are relevant to the tag/topic. No wonder Yahoo! has been so keen on integrating del.icio.us data into it's search."
I think the real issue here is coverage. its my hunch that del.icio.us data is highly skewed & evidence is that its probably too small to impact web search at large. You might be interested in this paper from WSDM: http://heymann.stanford.edu/improvewebsearch.html
Posted by: Jon Elsas | April 09, 2008 at 05:00 PM
Thanks for the reference Jon! This looks like a really interesting paper. I agree that coverage might definitely be a big problem. On reading the article you posted what was also quite interesting was Heymann's conclusion that "...a substantial proportion of tags are obvious in context, and many tagged pages would be discovered by a search engine."
Posted by: Akshay Java | April 09, 2008 at 05:32 PM
We're glad you like the swicki tool, Akshay. If you want to do more with swickis or have other questions, you are welcome to join our community at this link: http://shoutouts.swicki.com/. Take care.
Posted by: Jim | April 09, 2008 at 06:39 PM