NEWS Subscribe
April 14, 2008
Google looks for 550B more web pages

A search engine's work is never done. While engineers at Google spend a lot of their time refining search, a handful of employees have been working on a way to search the invisible web, a vast swath of sites and forums that don't appear on search engines because they are not indexed.

"In the past few months, we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn't find and index for users who search on Google," Jayant Madhavan and Alon Halevy wrote on Google's webmaster blog. "By crawling using HTML forms (and abiding by robots.txt), we are able to lead search engine users to documents that would otherwise not be easily found in search engines, and provide webmasters and users alike with a better and more comprehensive search experience."

Google said it would not index pages or sites webmasters intend to keep hidden from public view.

According to Google Operating System, a blog not affiliated with the search giant, the so-called invisible web contained an estimated 550 billion documents in 2001. But given the nature of that portion of the web, the true number remains something of a mystery.