Sitemap.xml — how to submit to Google and speed up crawl [2026]
Sitemap.xml is a map of your site's URLs for Googlebot — it helps with discovery, but does NOT guarantee indexing. This tutorial walks through 4 submission methods (GSC UI, robots.txt, IndexNow, programmatic API),
lastmodformat, sitemap-index for large sites, and the 7 most common technical errors.
TL;DR — Sitemap.xml in Google in 6 points
| # | What | Details |
|---|---|---|
| 1. Format | XML, max 50k URLs or 50 MB | gzip OK |
| 2. Location | Most often /sitemap.xml at root | any path, declared in robots.txt |
| 3. Submission | GSC → Sitemaps → enter URL | second: Sitemap: directive in robots.txt |
| 4. Lastmod | ISO 8601 (2026-05-03T20:14:00Z) | freshness signal — Google uses it |
| 5. Sitemap-index | For >50k URLs or modularity | recursive — index links to sub-sitemaps |
| 6. Limits | URLs only canonical, 200 OK, no noindex | DON'T include 4xx, redirects, noindex |
Get an SEO audit → if your sitemap contains 100+ URLs but Google indexes <30% of them.
What is sitemap.xml
Sitemap is an XML file containing a list of site URLs that the owner wants Googlebot to discover. It's the first layer of communication with search engines — saying "here is my site, here is its structure".
What sitemap does NOT do
| Myth | Reality |
|---|---|
| "Sitemap will index my URLs" | Sitemap helps with discovery, NOT indexing |
| "More URLs in sitemap is better for SEO" | 100 valuable URLs is better than 1000 mixed quality |
| "Sitemap increases ranking" | Sitemap is a neutral signal, NOT a ranking factor |
| "After adding URL to sitemap, Google crawls instantly" | 1-30 days typically, depending on domain authority |
What sitemap DOES do
- URL discovery — Google doesn't need to crawl the entire site through internal links, the sitemap provides URLs directly
- Freshness signal —
lastmodinforms Google that the URL has been updated - Priority (declarative) —
priorityis a suggestion (0.0 to 1.0), Google treats it as a hint, not command - Hreflang for multilingual — sitemap with
xhtml:linkfor each language version is the cleanest way to declare alternates - Diagnostics in GSC — Sitemaps → sitemap status shows submitted vs indexed URL counts
4 sitemap submission methods
Method 1: Google Search Console UI (most common)
- Open search.google.com/search-console
- Select property
- Left menu: Sitemaps
- Enter sitemap URL (most often
/sitemap.xml) and click Submit - Wait 1-7 days, check status: "Success" or "Couldn't fetch"
Pros: simplest. Cons: manual, one-time. After first submission, Google returns to sitemap on its own.
Method 2: Robots.txt directive (backup)
In robots.txt at domain root:
User-Agent: *
Allow: /
Sitemap: https://arduralab.com/sitemap.xml
Sitemap: https://arduralab.com/blog-sitemap.xml
Pros: auto-discovery. Any bot (not just Googlebot) can find it — Bingbot, Applebot, ChatGPT bots, etc. Cons: the directive itself isn't "submission" — Google still needs to notice robots.txt changes.
Method 3: Ping endpoint (DEPRECATED — DON'T use!)
Classic method before 2023:
https://www.google.com/ping?sitemap=https://your-domain.com/sitemap.xml
Status: DEPRECATED in June 2023. Google announced the ping endpoint has been turned off. You'll still see this in old tutorials — ignore. Use GSC UI or robots.txt directive.
Method 4: IndexNow (Bing/Yandex, Google testing)
IndexNow is a push-based protocol:
POST https://api.indexnow.org/indexnow
Content-Type: application/json
{
"host": "your-domain.com",
"key": "abc123def456",
"urlList": [
"https://your-domain.com/blog/post-1",
"https://your-domain.com/blog/post-2"
]
}
Requires file https://your-domain.com/abc123def456.txt with content abc123def456 (file-based authorization).
Status: Bing + Yandex full support. Cloudflare has native integration (Cloudflare Apps). Google testing, status as of 2026: undeclared support. Practice: use complementarily to sitemap for Bing.
Sitemap.xml format — example
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://arduralab.com/en/blog/seo-roadmap-2026-2027</loc>
<lastmod>2026-05-01T10:30:00+02:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
<xhtml:link rel="alternate" hreflang="pl" href="https://arduralab.com/blog/seo-roadmap-2026-2027"/>
<xhtml:link rel="alternate" hreflang="en" href="https://arduralab.com/en/blog/seo-roadmap-2026-2027"/>
</url>
</urlset>
Fields — what counts for Google
| Field | Used? | Note |
|---|---|---|
loc | ✅ ALWAYS | Canonical URL, 200 OK |
lastmod | ✅ YES | Google uses for crawl scheduling |
changefreq | ⚠️ Limited | Google treats as hint, not hard rule |
priority | ⚠️ Limited | Google ignores since 2017 (officially) |
xhtml:link | ✅ YES | Hreflang — cleanest method |
Practical conclusion: loc + lastmod + optionally xhtml:link for hreflang. Skip changefreq and priority — they only generate noise.
Sitemap-index for large sites
Standard sitemap limits
- 50,000 URLs max
- 50 MB max (before gzip compression — with gzip you can have 200+ MB)
Sitemap-index structure
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://arduralab.com/sitemap-blog.xml</loc>
<lastmod>2026-05-03</lastmod>
</sitemap>
<sitemap>
<loc>https://arduralab.com/sitemap-services.xml</loc>
<lastmod>2026-05-01</lastmod>
</sitemap>
<sitemap>
<loc>https://arduralab.com/sitemap-glossary.xml</loc>
<lastmod>2026-04-29</lastmod>
</sitemap>
</sitemapindex>
When to use sitemap-index
- ✅ >50k URLs (forced)
- ✅ Multi-section sites (blog, store, glossary) — modularity facilitates debugging
- ✅ E-commerce with many categories — separate sitemap per category
- ❌ Small site <500 URLs — unnecessary abstraction
Sitemap-index in Next.js / Astro
Most frameworks auto-generate sitemap-index at build time:
- Next.js 16: built-in
sitemap.tswithgenerateSitemapsexports → multiple sub-sitemaps - Astro 5:
@astrojs/sitemapintegration auto-creates index for >5k URLs
7 most common errors (and how to fix)
1. Noindex URLs in sitemap
Problem: Sitemap says "important", noindex says "don't index". Contradictory signal, Google ignores.
Fix: Filter in sitemap generator — exclude URLs with noindex in <meta name="robots">. Check via curl -s URL | grep "noindex".
2. 4xx/5xx URLs in sitemap
Problem: Sitemap contains URLs returning 404 (deleted), 410 (gone), or 500.
Fix: Pre-deploy validator — checks HTTP status of each URL in sitemap. Script:
while read url; do
status=$(curl -o /dev/null -s -w "%{http_code}" "$url")
echo "$status $url"
done < <(curl -s https://your-domain.com/sitemap.xml | grep -oP '(?<=<loc>)[^<]+')
3. Stale lastmod (always the same)
Problem: lastmod is updated at every deploy to the same timestamp for all URLs, even though only 5 were changed.
Fix: Per-URL lastmod with real time of last content change. Frameworks: Next.js → git log --format=%aI -1 -- file.mdx, Astro → import.meta.glob with eager + filesystem mtime.
4. Sitemap >50 MB without gzip
Problem: File errors out, Google refuses to parse.
Fix: Compress to sitemap.xml.gz (Cloudflare/CDN does it automatically). Restructure sub-sitemaps to <50k URLs.
5. No hreflang in multilingual sitemap
Problem: Site is PL+EN, but each URL has its own sitemap entry without xhtml:link — Google doesn't know they are versions of each other.
Fix: Each URL in sitemap must have xhtml:link for each language version (including self-reference). Sitemap generator must do this.
6. Canonical links contradicting sitemap
Problem: Sitemap contains URL_A, but the page has <link rel="canonical" href="URL_B">. Google trusts canonical, ignores sitemap entry.
Fix: Sitemap always contains canonical URL — not variant. Audit: curl -s URL | grep canonical — compare with URL in sitemap.
7. Sitemap not refreshing on deploy
Problem: Sitemap is cached 24h+ by CDN. After deploying new blogs, Google sees old sitemap.
Fix: Cache control header for sitemap = max-age=300 (5 min). Cloudflare: Page Rules → URL pattern */sitemap* → Cache Level: Bypass.
How to monitor sitemap effectiveness
In GSC Sitemaps tab
- Discovered — Google noticed the sitemap (time: 1-7 days)
- Submitted — Google accepted URLs from sitemap to crawl queue
- Indexed (separately in Coverage) — how many URLs from sitemap actually indexed
Hard truth: for fresh domains, typically 10-30% of URLs from sitemap get indexed in the first quarter. For domains with DR 30+ — 60-80%.
In GSC Coverage Report
- Submitted and indexed — sitemap is performing
- Submitted, currently not indexed — Google noticed but decided not to index (most common state for new domains)
- Discovered, currently not indexed — Google found but hasn't crawled yet (crawl budget issue)
If >50% of URLs fall into "Discovered, not indexed" — you have a crawl budget problem. Consider Indexing API as force-crawl.
FAQ — most common questions
Can I have multiple sitemaps for the same property?
Yes. You can submit multiple sitemaps independently — sitemap.xml, blog-sitemap.xml, news-sitemap.xml. Easiest, however — sitemap-index linking to all. Limit per property in GSC: 500 sitemaps.
What about <image:image> and <video:video>?
Special sitemap extensions for images and video. Use if you have lots of unique image content (e-commerce, gallery sites) or video. For typical blog — optional, small impact.
Does sitemap replace sitemap.xml for AI crawlers?
No. AI crawlers (GPTBot, ClaudeBot, PerplexityBot) use llms.txt and llms-full.txt as their "sitemap". Sitemap.xml + llms.txt are complementary — first for search engines, second for LLMs.
How long to keep 301 redirects in sitemap?
Briefly (1-3 months). After deploying new URLs, add 301 source URLs to sitemap so Google quickly notices the redirect. After 1-3 months remove them — unnecessary entries clutter.
Does sitemap need to be static or can it be dynamically generated?
Can be dynamic (e.g., Next.js sitemap.ts, Astro plugin), but important: must be cacheable + stable. Don't generate different content on every request (Google detects and ignores).
What is priority and should I use it?
priority in sitemap (0.0-1.0) is a declarative value. Google announced in 2017 that it ignores this field. Leave default (0.5) or remove entirely.
What's next
- Check your sitemap now —
curl -s https://your-domain.com/sitemap.xml | head -20and count URLs:curl -s ... | grep -c "<loc>" - Audit 7 errors — go through the checklist above and fix each found
- Submit via GSC if not yet submitted
- Read more:
Get an SEO audit
Sitemap contains 100+ URLs but Google indexes <30% of them? Get an SEO audit → — we check 7 most common causes (canonical, noindex, thin content, crawl budget) and implement a force-indexing strategy.