Finding information by crawling
We use software known as “web crawlers” to discover publicly available webpages.
The most well-known crawler is called “Googlebot.” Crawlers look at webpages and
follow links on those pages, much like you would if you were browsing content on
the web. They go from link to link and bring data about those webpages back to
The crawl process begins with a list of web addresses from past crawls and sitemaps
provided by website owners. As our crawlers visit these websites, they look for
links for other pages to visit. The software pays special attention to new sites,
changes to existing sites and dead links.
Computer programs determine which sites to crawl, how often, and how many pages to
fetch from each site. Google doesn't accept payment to crawl a site more frequently
for our web search results. We care more about having the best possible results
because in the long run that’s what’s best for users and, therefore, our
Choice for website owners
Most websites don’t need to set up restrictions for crawling, indexing or serving,
so their pages are eligible to appear in search results without having to do any
extra work. That said, site owners have many choices about how Google crawls and
indexes their sites through Webmaster Tools and a file called “robots.txt”.
With the robots.txt file, site owners can choose not to be crawled by Googlebot, or
they can provide more specific instructions about how to process pages on their
Site owners have granular choices and can choose how content is indexed on a
page-by-page basis. For example, they can opt to have their pages appear without a
snippet (the summary of the page shown below the title in search results) or a
cached version (an alternate version stored on Google’s servers in case the live
page is unavailable). Webmasters can also choose to integrate search into their own
pages with Custom Search.
Organizing information by indexing
The web is like an ever-growing public library with billions of books and no
central filing system. Google essentially gathers the pages during the crawl
process and then creates an index, so we know exactly how to look things up. Much
like the index in the back of a book, the Google index includes information about
words and their locations. When you search, at the most basic level, our algorithms
look up your search terms in the index to find the appropriate pages.
The search process gets much more complex from there. When you search
for “dogs” you don’t want a page with the word “dogs” on it hundreds of
times. You probably want pictures, videos or a list of breeds. Google’s indexing
systems note many different aspects of pages, such as when they were published,
whether they contain pictures and videos, and much more. With the Knowledge Graph, we’re continuing to go
beyond keyword matching to better understand the people, places and things you care
To learn about the tools and resources available to site owners, visit Webmaster Central.
How Search Works handout
Check out a graphic illustrating the
various phases of the search process, from before you search, to ranking, to