Think of all the blogs you have started till date. How many are still active? Yep! We have all been there: started a blog and never kept up with it, got busy with other interests or simply do not find enough time to keep up with blogging.
I wanted to see what is the overall distribution of the date when a post was made. From a random sample of 50,000 feeds that were extracted from the Bloglines feed subscriptions, following is the distribution of the date when the post was made.
From the random sample of feeds downloaded (~42K successfully; approx between 10-15 posts per feed) , 66% of the posts were made in 2008. As for the rest, some even date back to 2001. This indicates that from a random sample, a sizable number of feeds have posts that have been updated within the last year.
This also comes back to my previous point about efficient feed crawling strategies. How do you dedicate resources for keeping a feed index fresh and ensure a good coverage/quality? It would be interesting to see how the following two metrics change as we move from most subscribed to least subscribed feeds:
- Mean Time To Update Between Posts: Do highly subscribed feeds generally update more frequently?
- lastBuildDate: When was the feed last (re)published?