I read this fascinating article (via Techmeme) that indicates that "the average Web page size has tripled since 2003". IMHO, average is still a problematic measure when dealing with Power-law distributions. I like the example that Clay Shirky mentions in the book "Here Comes Everybody":
If Bill Gates walked into a bar ... we'd all be Millionaires ......... ON AVERAGE!
Same holds when we are talking about web pages. Nevertheless, the study is quite interesting and provides a very good analysis of how Web content is changing.
This made me wonder:
"What is the contribution of Social Media content in tripling the size of an average Web page?"
While not a comprehensive study, I did a very quick back of the envelope experiment to see what this would be like. I fetched the top 400 Web pages, as ranked by Alexa. Similarly, I got a bunch of 400 blogs (from the Buzzmetrics dataset) and cached their homepages as well (wget -p <url>). Following is a graph that compares the sizes of the homepages from the two datasets.
Looks like the size of a blog is "on an average" is larger than the size
of a regular Webpage suggesting that a good deal of the increase in the size of a Webpage could be due to Social Media
content.
I think a more detailed study here would be insightful.
I wonder how much of the blogs size is from optimization issues (images, widgets and such). Have you tried a couple of different blog sets? I would be curious to see if that changes the times.
Disclosure: I work for Nielsen Online formerly BuzzMetrics.
Posted by: Stephen | April 29, 2008 at 04:16 PM
Hi Stephen, my guess is you are absolutely right! Most of the size for blog data might be contributed by widgets and images. Most of these are not highly optimized -- see my own blog for example!!
I do want to try out a couple of other datasets and frankly 400 blogs is hardly a dataset! But I am curious to see results in a larger setting. It would also be worthwhile to examine the avg post size, number of HTML elements and JavaScript usage etc.
It also reminds me of the study "Toward a PeopleWeb", by Raghu Ramakrishnan and Adrew Tomkins. According to their analysis the Social Web contributes 4-5 times more content than professionally edited text.
I hope I can find time to do some more analysis.
Thanks for your comments!
Posted by: Akshay Java | April 29, 2008 at 04:40 PM
Would it be a reasonable approximation to just grab the feed from several blogs? That would avoid all the cruft outside each post, and you'd be dealing with just the text from the blogs. To be fair, if the content included an image, you could grab that image too...
Just a thought
Posted by: fitzgeraldsteele | May 01, 2008 at 06:51 PM
To an extent, I wanted to measure exactly how 'bloated' a blog homepage is. So if it has many widgets and images -- indeed it is going to be bulky. You bring up an interesting point about the size of a post vs. size of a regular Web page. I can try to run a few scripts and see if I can pull out some numbers from buzzmetrics data set. Thanks for the suggestion, I'm sure it will make another interesting blog post! Gimme a while and I shall try to put out something...
Posted by: Akshay Java | May 01, 2008 at 10:16 PM