»
An Investigation of Documents from the World Wide Web 
Paper by Woodruff, Aoki, Brewer, Gauthier, and Rowe describing their analysis of over 2.6 million HTML documents collected by thre Inktomi Web crawler. The authors examined many characteristics of these documents, including size, number and types of tags
http://www.paulaoki.com/papers/www5-color.pdf