Study: Search Engines Increase Coverage by John Stith
As we go through existence, we find there are great questions in life: What is life’s meaning? What is the airspeed velocity of an unladen swallow? And finally, just how big is the Internet? A recent study attempted to put some of the Internet question into perspective and they discuss how much is covered by the big search engines.
First, we need to discuss just big the Internet is. WOW! It is insanely large. Antonio Gulli of the University of Pisa (Universita di Pisa) and Alessio Signorini at the University of Iowa have been working on this topic for a while and have published an abstract
They state previous estimates on the size of the Internet are now obsolete. The studies done in 1997 and 1998 respectively were much smaller in their scale. The first in 1997 was around 200 million pages and 1998 was around 800 million pages. Not even close anymore. The current study estimates around 11.5 billion pages. That’s a lot of pages. They break down that total by coverage of search engines.
Break It Down
One point they do mention is that the search engines underestimate their own coverage areas. Google, the largest engine claims to cover 8.1 billion pages, the abstract says 8.8 billion pages at a coverage rate of 76.2%. They were closest to their estimate of coverage area.
Yahoo ranked second and they severely underestimated their coverage rate. Yahoo estimates coverage of 4.2 billion and were closer to 8 billion or a 69.3% coverage area.
MSN claimed it covered 5 billion pages and the study showed 7.1 billion or 61.9% coverage. This is their beta too. They could give Google a run in the future. Ask Jeeves/Teoma ranked 4th, estimating 2.5 billion covered and the study said 6.6 billion or 57.6%. The indexed web hit about 9.4 billion or 81.4%
The method these gentlemen used was based on the 1997 study. The original study utilized 35,000 queries in English; the new study covered over 438,141 queries in 75 languages. This gives the study some international meaning, not just the U.S. spin and gives a much stronger estimate than originally done back in the 90s.
The primary flaw to this abstract is that it only covers the accessible pages search engines can find. There are billion of pages in various systems search engines haven’t grabbed. Search Engine Watch says some estimate over 500 billion pages.
Keep in mind too that even though Google returns 9 million hits, less the 50,000 talk about her battle with Hepatitis C and even then, how many of those pages would actually be significant. Relevancy is still the key to it all. All those sights are pointless if they don’t give good, reliable information.
Uses for this information are myriad. It gives some basis for what we’re dealing with on the Internet and how big it might be. It could help with marketing, especially as companies prepare to move into the other markets, like China. With Google getting access to the mainland in Shanghai and owning a percentage of Baidu and Yahoo already in China, this makes this study even more relevant.
About the Author:
John Stith is a staff writer for WebProNews covering technology and business.