tIn the State of the Blogosphere report posted by Technorati recently, they describe how the top bloggers update their blogs more frequently than the rest of the population. Given the data I have been gathering on the Bloglines subscriptions, I was curious to see what how most subscribed blogs behaved? For the top 100K feeds ranked by the number of subscribers, I fetched their RSS/ATOM files (some failed, some were filtered due to few posts etc). Using this handy PERL module, I parsed all the pubDates (sorry, RSS only for now) and converted them to their corresponding Unix timestamps. Now for each of these feeds, the Mean Time to Update (ie the avg of the number of days between consequent posts) can be obtained. Interestingly, some fraction of the posts were updated after they had been published (hence were out-of-sequence in the RSS feed). For this analysis, I do not consider these updates -- it would be interesting to dig further into it.
Following graph F(x) is defined as the proportion of X values less than or equal to x. Thus you can observe that for the top 10K feeds 40% of the feeds updated in a day or less. For the rest of the feeds it was 30%. I believe as we go lower in the ranking, this would keep decreasing even further. Interestingly, overall, for the highly subscribed feeds, almost 60% update at least once a week.
For a feed reader keeping an index fresh and up-to-date is then a function of how many users subscribe to a feed, how often the feed is updated, when was the last time the feed was requested (by any user in the system) and perhaps even when the last ping was registered for a particular feed in the ping stream. I believe in Google reader's case, clicking "show details" displays the last time when the feed was refreshed -- which I have never found to be over 3 hours for any feed that I subscribe to.
ACK: The above plot was made in MATLAB using this script for cdfplot.