Sitemap XML
What is an XML sitemap?
An XML sitemap is a file containing a list of all important pages on your website that should be indexed by search engines. The sitemap file informs crawlers (e.g., Googlebot) about the site structure, priorities, and last modification dates — facilitating efficient scanning and discovery of new content.
A sitemap is particularly important for large websites (1,000+ pages), new sites without many backlinks, and sites with dynamically generated content.
Why does it matter?
- Faster indexing — Google discovers new and updated pages more quickly
- Efficient crawling — the crawler knows which pages are most important
- Diagnostics — Google Search Console reports indexing errors per URL from the sitemap
- Large websites — without a sitemap, Google may not discover pages deep in the site structure
How does a sitemap work?
A sitemap is an XML file typically located at /sitemap.xml. It contains:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page</loc>
<lastmod>2026-03-01</lastmod>
</url>
</urlset>
Key elements:
- loc — full page URL (required)
- lastmod — date of last modification (recommended — but must be accurate!)
- changefreq — frequency of changes (ignored by Google)
- priority — page priority (ignored by Google)
Best practices
- Include the sitemap in robots.txt —
Sitemap: https://yourdomain.com/sitemap.xml - Submit in Google Search Console — verification and monitoring of indexing status
- Accurate lastmod dates — change the date ONLY when content actually changes (not on every build!)
- Canonical URLs only — do not add pages with noindex, redirects, or duplicates
- 50,000 URL limit — for larger sites, use a sitemap index
- Automatic generation — frameworks like Next.js and Astro generate sitemaps automatically
Details in our technical SEO checklist.
Related terms
- Robots.txt — a file controlling crawler access
- Crawlability — the ability of a page to be scanned
- Indexing — the process of adding pages to Google's index
- SEO — search engine optimization
- Canonical URL — specifying the canonical version of a page