Skip to content
ARDURA Lab
ARDURA Lab
·9 min

Sitemap.xml — how to submit to Google and speed up crawl [2026]

sitemapGoogle Search Consoleindexingtechnical SEOcrawl
MG
Marcin Godula

CEO & Founder, ARDURA Lab

Specjalista SEO, GEO i web development z ponad 15-letnim doświadczeniem. Pomaga firmom B2B budować widoczność w wyszukiwarkach klasycznych i AI.

Sitemap.xml is a map of your site's URLs for Googlebot — it helps with discovery, but does NOT guarantee indexing. This tutorial walks through 4 submission methods (GSC UI, robots.txt, IndexNow, programmatic API), lastmod format, sitemap-index for large sites, and the 7 most common technical errors.

TL;DR — Sitemap.xml in Google in 6 points

#WhatDetails
1. FormatXML, max 50k URLs or 50 MBgzip OK
2. LocationMost often /sitemap.xml at rootany path, declared in robots.txt
3. SubmissionGSC → Sitemaps → enter URLsecond: Sitemap: directive in robots.txt
4. LastmodISO 8601 (2026-05-03T20:14:00Z)freshness signal — Google uses it
5. Sitemap-indexFor >50k URLs or modularityrecursive — index links to sub-sitemaps
6. LimitsURLs only canonical, 200 OK, no noindexDON'T include 4xx, redirects, noindex

Get an SEO audit → if your sitemap contains 100+ URLs but Google indexes <30% of them.


What is sitemap.xml

Sitemap is an XML file containing a list of site URLs that the owner wants Googlebot to discover. It's the first layer of communication with search engines — saying "here is my site, here is its structure".

What sitemap does NOT do

MythReality
"Sitemap will index my URLs"Sitemap helps with discovery, NOT indexing
"More URLs in sitemap is better for SEO"100 valuable URLs is better than 1000 mixed quality
"Sitemap increases ranking"Sitemap is a neutral signal, NOT a ranking factor
"After adding URL to sitemap, Google crawls instantly"1-30 days typically, depending on domain authority

What sitemap DOES do

  1. URL discovery — Google doesn't need to crawl the entire site through internal links, the sitemap provides URLs directly
  2. Freshness signallastmod informs Google that the URL has been updated
  3. Priority (declarative)priority is a suggestion (0.0 to 1.0), Google treats it as a hint, not command
  4. Hreflang for multilingual — sitemap with xhtml:link for each language version is the cleanest way to declare alternates
  5. Diagnostics in GSC — Sitemaps → sitemap status shows submitted vs indexed URL counts

4 sitemap submission methods

Method 1: Google Search Console UI (most common)

  1. Open search.google.com/search-console
  2. Select property
  3. Left menu: Sitemaps
  4. Enter sitemap URL (most often /sitemap.xml) and click Submit
  5. Wait 1-7 days, check status: "Success" or "Couldn't fetch"

Pros: simplest. Cons: manual, one-time. After first submission, Google returns to sitemap on its own.

Method 2: Robots.txt directive (backup)

In robots.txt at domain root:

User-Agent: *
Allow: /

Sitemap: https://arduralab.com/sitemap.xml
Sitemap: https://arduralab.com/blog-sitemap.xml

Pros: auto-discovery. Any bot (not just Googlebot) can find it — Bingbot, Applebot, ChatGPT bots, etc. Cons: the directive itself isn't "submission" — Google still needs to notice robots.txt changes.

Method 3: Ping endpoint (DEPRECATED — DON'T use!)

Classic method before 2023:

https://www.google.com/ping?sitemap=https://your-domain.com/sitemap.xml

Status: DEPRECATED in June 2023. Google announced the ping endpoint has been turned off. You'll still see this in old tutorials — ignore. Use GSC UI or robots.txt directive.

Method 4: IndexNow (Bing/Yandex, Google testing)

IndexNow is a push-based protocol:

POST https://api.indexnow.org/indexnow
Content-Type: application/json

{
  "host": "your-domain.com",
  "key": "abc123def456",
  "urlList": [
    "https://your-domain.com/blog/post-1",
    "https://your-domain.com/blog/post-2"
  ]
}

Requires file https://your-domain.com/abc123def456.txt with content abc123def456 (file-based authorization).

Status: Bing + Yandex full support. Cloudflare has native integration (Cloudflare Apps). Google testing, status as of 2026: undeclared support. Practice: use complementarily to sitemap for Bing.


Sitemap.xml format — example

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <url>
    <loc>https://arduralab.com/en/blog/seo-roadmap-2026-2027</loc>
    <lastmod>2026-05-01T10:30:00+02:00</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
    <xhtml:link rel="alternate" hreflang="pl" href="https://arduralab.com/blog/seo-roadmap-2026-2027"/>
    <xhtml:link rel="alternate" hreflang="en" href="https://arduralab.com/en/blog/seo-roadmap-2026-2027"/>
  </url>
</urlset>

Fields — what counts for Google

FieldUsed?Note
loc✅ ALWAYSCanonical URL, 200 OK
lastmod✅ YESGoogle uses for crawl scheduling
changefreq⚠️ LimitedGoogle treats as hint, not hard rule
priority⚠️ LimitedGoogle ignores since 2017 (officially)
xhtml:link✅ YESHreflang — cleanest method

Practical conclusion: loc + lastmod + optionally xhtml:link for hreflang. Skip changefreq and priority — they only generate noise.


Sitemap-index for large sites

Standard sitemap limits

  • 50,000 URLs max
  • 50 MB max (before gzip compression — with gzip you can have 200+ MB)

Sitemap-index structure

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://arduralab.com/sitemap-blog.xml</loc>
    <lastmod>2026-05-03</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://arduralab.com/sitemap-services.xml</loc>
    <lastmod>2026-05-01</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://arduralab.com/sitemap-glossary.xml</loc>
    <lastmod>2026-04-29</lastmod>
  </sitemap>
</sitemapindex>

When to use sitemap-index

  • ✅ >50k URLs (forced)
  • ✅ Multi-section sites (blog, store, glossary) — modularity facilitates debugging
  • ✅ E-commerce with many categories — separate sitemap per category
  • ❌ Small site <500 URLs — unnecessary abstraction

Sitemap-index in Next.js / Astro

Most frameworks auto-generate sitemap-index at build time:

  • Next.js 16: built-in sitemap.ts with generateSitemaps exports → multiple sub-sitemaps
  • Astro 5: @astrojs/sitemap integration auto-creates index for >5k URLs

7 most common errors (and how to fix)

1. Noindex URLs in sitemap

Problem: Sitemap says "important", noindex says "don't index". Contradictory signal, Google ignores.

Fix: Filter in sitemap generator — exclude URLs with noindex in <meta name="robots">. Check via curl -s URL | grep "noindex".

2. 4xx/5xx URLs in sitemap

Problem: Sitemap contains URLs returning 404 (deleted), 410 (gone), or 500.

Fix: Pre-deploy validator — checks HTTP status of each URL in sitemap. Script:

while read url; do
  status=$(curl -o /dev/null -s -w "%{http_code}" "$url")
  echo "$status $url"
done < <(curl -s https://your-domain.com/sitemap.xml | grep -oP '(?<=<loc>)[^<]+')

3. Stale lastmod (always the same)

Problem: lastmod is updated at every deploy to the same timestamp for all URLs, even though only 5 were changed.

Fix: Per-URL lastmod with real time of last content change. Frameworks: Next.js → git log --format=%aI -1 -- file.mdx, Astro → import.meta.glob with eager + filesystem mtime.

4. Sitemap >50 MB without gzip

Problem: File errors out, Google refuses to parse.

Fix: Compress to sitemap.xml.gz (Cloudflare/CDN does it automatically). Restructure sub-sitemaps to <50k URLs.

5. No hreflang in multilingual sitemap

Problem: Site is PL+EN, but each URL has its own sitemap entry without xhtml:link — Google doesn't know they are versions of each other.

Fix: Each URL in sitemap must have xhtml:link for each language version (including self-reference). Sitemap generator must do this.

6. Canonical links contradicting sitemap

Problem: Sitemap contains URL_A, but the page has <link rel="canonical" href="URL_B">. Google trusts canonical, ignores sitemap entry.

Fix: Sitemap always contains canonical URL — not variant. Audit: curl -s URL | grep canonical — compare with URL in sitemap.

7. Sitemap not refreshing on deploy

Problem: Sitemap is cached 24h+ by CDN. After deploying new blogs, Google sees old sitemap.

Fix: Cache control header for sitemap = max-age=300 (5 min). Cloudflare: Page Rules → URL pattern */sitemap* → Cache Level: Bypass.


How to monitor sitemap effectiveness

In GSC Sitemaps tab

  • Discovered — Google noticed the sitemap (time: 1-7 days)
  • Submitted — Google accepted URLs from sitemap to crawl queue
  • Indexed (separately in Coverage) — how many URLs from sitemap actually indexed

Hard truth: for fresh domains, typically 10-30% of URLs from sitemap get indexed in the first quarter. For domains with DR 30+ — 60-80%.

In GSC Coverage Report

  • Submitted and indexed — sitemap is performing
  • Submitted, currently not indexed — Google noticed but decided not to index (most common state for new domains)
  • Discovered, currently not indexed — Google found but hasn't crawled yet (crawl budget issue)

If >50% of URLs fall into "Discovered, not indexed" — you have a crawl budget problem. Consider Indexing API as force-crawl.


FAQ — most common questions

Can I have multiple sitemaps for the same property?

Yes. You can submit multiple sitemaps independently — sitemap.xml, blog-sitemap.xml, news-sitemap.xml. Easiest, however — sitemap-index linking to all. Limit per property in GSC: 500 sitemaps.

What about <image:image> and <video:video>?

Special sitemap extensions for images and video. Use if you have lots of unique image content (e-commerce, gallery sites) or video. For typical blog — optional, small impact.

Does sitemap replace sitemap.xml for AI crawlers?

No. AI crawlers (GPTBot, ClaudeBot, PerplexityBot) use llms.txt and llms-full.txt as their "sitemap". Sitemap.xml + llms.txt are complementary — first for search engines, second for LLMs.

How long to keep 301 redirects in sitemap?

Briefly (1-3 months). After deploying new URLs, add 301 source URLs to sitemap so Google quickly notices the redirect. After 1-3 months remove them — unnecessary entries clutter.

Does sitemap need to be static or can it be dynamically generated?

Can be dynamic (e.g., Next.js sitemap.ts, Astro plugin), but important: must be cacheable + stable. Don't generate different content on every request (Google detects and ignores).

What is priority and should I use it?

priority in sitemap (0.0-1.0) is a declarative value. Google announced in 2017 that it ignores this field. Leave default (0.5) or remove entirely.


What's next

  1. Check your sitemap nowcurl -s https://your-domain.com/sitemap.xml | head -20 and count URLs: curl -s ... | grep -c "<loc>"
  2. Audit 7 errors — go through the checklist above and fix each found
  3. Submit via GSC if not yet submitted
  4. Read more:

Get an SEO audit

Sitemap contains 100+ URLs but Google indexes <30% of them? Get an SEO audit → — we check 7 most common causes (canonical, noindex, thin content, crawl budget) and implement a force-indexing strategy.

Terms from this article

Need help with this topic?

Get a free audit and find out how we can help your business grow online.

Get a free quote