Przejdź do treści
ARDURA Lab
ARDURA Lab
·4 min

Crawl Budget

What is crawl budget?

Crawl budget is the number of pages Googlebot can and wants to scan on your website within a given period. Google has limited resources and cannot scan the entire internet simultaneously — so it allocates a specific crawling budget to each site.

Crawl budget is particularly important for large websites (thousands or millions of pages). For small sites (up to a few hundred pages), it typically isn't a problem — Google can scan the entire site without difficulty.

How does crawl budget work?

Crawl budget consists of two components:

Crawl capacity limit

This is the maximum number of simultaneous connections Googlebot can establish with your server without overloading it. Google automatically reduces crawling intensity when:

  • The server responds slowly (high TTFB)
  • The server returns 5xx errors
  • The site owner has limited crawling in robots.txt (Crawl-delay)

Crawl demand

This is Google's interest in scanning your website, depending on:

  • URL popularity — pages with more backlinks and traffic are crawled more frequently
  • Freshness — regularly updated pages have higher priority
  • Content type — new URLs discovered in the sitemap or links are crawled with priority

Why is crawl budget important?

If Googlebot exhausts the crawl budget on unimportant pages (duplicates, URL parameters, error pages), it may never reach your most important content — articles, service pages, new products.

Consequences of crawl budget problems:

  • Delayed indexing of new content — blog articles appear in Google after days or weeks instead of hours (see how to speed up Google indexing)
  • Outdated data in the index — changes to existing pages aren't reflected in search results
  • Unindexed pages — parts of the site may never be scanned

When is crawl budget a problem?

  • The site has more than 10,000 pages
  • The site generates many duplicates (URL parameters, filters, sorting)
  • The server is slow or unstable
  • The site has deep architecture — pages accessible only after many clicks
  • A large portion of the site returns 404 or 5xx errors

How to optimize crawl budget?

Eliminate budget waste

  • Remove or block duplicates — URL parameters, sorting, and filters should have canonical or noindex (more techniques in the technical SEO checklist)
  • Fix 404 and 5xx errors — every request to an error page is a wasted crawl
  • Limit pagination — hundreds of /page/2, /page/3 pages waste budget
  • Block unimportant resources — admin pages, internal search results, shopping cart

Improve crawling efficiency

  • Update sitemap.xml — submit an up-to-date sitemap in Google Search Console
  • Optimize robots.txt — block crawling of sections that shouldn't be indexed
  • Use flat architecture — every important page should be accessible within max 3 clicks from the homepage
  • Link internally — new pages should be connected to existing content

Improve server speed

  • Optimize TTFB — server response time below 200ms
  • Consider SSG — static files are served instantly
  • Use a CDN — shortens response time for crawlers
  • Monitor server logs — analyze how Googlebot crawls your site

How to monitor crawl budget?

  • Google Search Console — the "Crawl Stats" report shows the number of crawled pages, response time, and errors
  • Server logs — direct analysis of Googlebot requests (user-agent: Googlebot)
  • Screaming Frog — crawl simulation and problem identification
  • Robots.txt tester — verification that blocking rules are correct

Example

An e-commerce store has 5,000 products but generates 50,000 URLs through filtering parameters (color, size, price, sorting). Googlebot primarily crawls filter pages, ignoring new products. After implementing canonical URLs on parameterized pages, noindex on filter pages, and updating the sitemap, Google begins indexing new products within 24 hours instead of 2 weeks.

Related terms

  • Indexing — the process of recording a page in Google's index, dependent on crawl budget
  • Crawlability — the ability of crawlers to traverse a website
  • Robots.txt — a file controlling crawler access
  • Sitemap — a site map facilitating crawling
  • Canonical URL — a tag preventing crawl budget waste on duplicates

Need help?

Comprehensive search engine optimization — check out our offer.