1. Hadoop: Hadoop is backed by Yahoo! and is quite actively used by many researchers and startups alike. I am so psyched about Hadoop that I have been driving my labmates (and ex-colleagues) crazy about it right now. This weekend Vlad and I managed to set up Hadoop on our servers at ebiquity. Right now we have about six servers in a "mini cluster". For many large scale applications it is almost essential to process the data on distributed, parallel manner.
2. Hbase: is the bigtable implementation for Hadoop. I havent read much about it but from what I understand it is a column database.
3. Heretrix: is a crawler from Alexa. Apparently some of the social media startups like PersAI have been using it. At one point I was excited about Nutch essentially because it is really easy to use and run. Additionally, being a Lucene project Nutch is also capable of being run on hadoop. However, the sad part is that it comes with its own little hadoop installation and integrating it with the latest version of Hadoop seems to be more pain than I dont wish to get into right now.
4. Hama: I think this is a great project that is trying to port typical Linear Algebra operations on to Hadoop. This would essentially speed up development time for typical machine learning algorithms. It is like the LAPACK for MapReduce. Alas, the project is still under development and it would be wonderful if we could all contribute to it so that we can share the power of such tools.
5. Amazon EC2: Honestly, I did waste about 10 bucks playing with EC2 and getting Nutch up and running on the compute cloud. However, after talking to a few people I realized that it is perhaps not that cost effective if we were to run crawlers on EC2. It would be damn expensive to have a bunch of hadoop nodes running 24*7 fetching pages and storing it in S3. I know that Powerset and others use this for some of their processing. (Interestingly, I believe powerset has also contributed to Hbase). However, I think there is a tradeoff wherein for smallish jobs it is makes more economical sense to run the processes on some of the racks locally - which is the model I have chosen for the time being anyway.
6. SVN : Well for a CVS guy, it is a slight shift to move to SVN. But I think it looks like it is much easier to use and I think I would be totally moving my repositories to SVN.
7. LaTeX over PPT : I am so glad that I have discovered the joy of using LaTeX over PPT. It is so simple and easy to use the beamer package and the presentations come out looking slick and wonderful.
There are a whole bunch of things I am just waiting to get my hands on but im seriously crunched for time right now. :-(
Any other cool stuff I should watch out for? So, which cool new software/app/tool are you playing with these days? :-)
Thanks for your review.
> Alas, the project is still under development and it would be wonderful if we could all contribute to it so that we can share the power of such tools.
If you have a interest in this proposal, please feel free to leave any feedback and join to us.
http://wiki.apache.org/incubator/HamaProposal
Thanks. :)
Posted by: Edward | March 26, 2008 at 09:39 PM
If you depend on link or site selling as a form of monetization you’ll definitely want to increase your http://www.alexa.com/data/details/main?url=www.fortunehotels.in Alexa rank, because it’ll increase your bargaining power when it comes to ad pricing.
Posted by: Fortunehotels ranking in Alexa | April 05, 2008 at 05:14 AM
I don't even know,fellow!) continued to write in the same vein, it is interesting people!
Posted by: Dynctaita | November 04, 2008 at 10:34 AM
Blogs are good for every one where we get lots of information for any topics nice job keep it up !!!
Posted by: digital dissertation | December 31, 2008 at 01:57 AM