Sitemap and Robots.txt Configuration¶
Breadcrumbs: Documentation > Guides > SEO > Sitemap & Robots
This guide explains the dynamic sitemap generation and robots.txt configuration for search engines and AI crawlers.
Table of Contents¶
- Sitemap Implementation
- Robots.txt Implementation
- AI Crawler Configuration
- Testing and Validation
- Best Practices
Sitemap Implementation¶
What is a Sitemap?¶
An XML file listing all pages on your site, helping search engines:
- Discover all pages
- Understand site structure
- Determine crawl priority
- Know update frequency
File Location¶
Basic Implementation¶
// app/sitemap.ts
import { MetadataRoute } from "next";
import { siteConfig } from "@/lib/config";
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
const baseUrl = siteConfig.url;
return [
{
url: baseUrl,
lastModified: new Date(),
changeFrequency: "weekly",
priority: 1,
},
{
url: `${baseUrl}/projects`,
lastModified: new Date(),
changeFrequency: "weekly",
priority: 0.9,
},
];
}
Dynamic Routes¶
Include all projects:
import { getProjects } from "@/lib/projects";
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
const baseUrl = siteConfig.url;
const projects = await getProjects();
const projectRoutes = projects.map((project) => ({
url: `${baseUrl}/projects/${project.slug}`,
lastModified: new Date(),
changeFrequency: "monthly" as const,
priority: 0.8,
}));
return [
// Static routes
{
url: baseUrl,
lastModified: new Date(),
changeFrequency: "weekly",
priority: 1,
},
// Dynamic project routes
...projectRoutes,
];
}
Priority Guidelines¶
| Page Type | Priority | Change Frequency |
|---|---|---|
| Homepage | 1.0 | weekly |
| Main sections | 0.9 | weekly |
| Project details | 0.8 | monthly |
| About sections | 0.7 | monthly |
| Downloads | 0.6 | monthly |
| AI endpoints | 0.5 | monthly |
Access¶
URL: https://your-domain.com/sitemap.xml
Output format:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://simon.stijnen.be</loc>
<lastmod>2026-02-09T12:00:00.000Z</lastmod>
<changefreq>weekly</changefreq>
<priority>1</priority>
</url>
<url>
<loc>https://simon.stijnen.be/projects</loc>
<lastmod>2026-02-09T12:00:00.000Z</lastmod>
<changefreq>weekly</changefreq>
<priority>0.9</priority>
</url>
</urlset>
Robots.txt Implementation¶
What is robots.txt?¶
A file that tells search engines:
- Which pages to crawl
- Which pages to skip
- Where the sitemap is located
- Crawl rate limits (optional)
File Location¶
Basic Implementation¶
// app/robots.ts
import type { MetadataRoute } from "next";
import { siteConfig } from "@/lib/config";
export default function robots(): MetadataRoute.Robots {
return {
rules: {
userAgent: "*",
allow: "/",
disallow: "/private/",
},
sitemap: `${siteConfig.url}/sitemap.xml`,
host: siteConfig.url,
};
}
Multiple Rules¶
export default function robots(): MetadataRoute.Robots {
return {
rules: [
{
userAgent: "*",
allow: "/",
disallow: ["/admin/", "/api/private/"],
},
{
userAgent: "GPTBot",
allow: ["/llms.txt", "/"],
},
],
sitemap: `${siteConfig.url}/sitemap.xml`,
host: siteConfig.url,
};
}
Access¶
URL: https://your-domain.com/robots.txt
Output format:
User-Agent: *
Allow: /
Disallow: /admin/
User-Agent: GPTBot
Allow: /llms.txt
Allow: /
Sitemap: https://simon.stijnen.be/sitemap.xml
Host: https://simon.stijnen.be
AI Crawler Configuration¶
Supported AI Crawlers¶
List of known AI crawlers:
| Crawler | Owner | Purpose |
|---|---|---|
GPTBot |
OpenAI | ChatGPT training |
ChatGPT-User |
OpenAI | ChatGPT browsing |
Claude-Web |
Anthropic | Claude web search |
Anthropic-AI |
Anthropic | AI training |
Google-Extended |
Bard/Gemini training | |
PerplexityBot |
Perplexity | Perplexity search |
Bytespider |
ByteDance | TikTok AI |
CCBot |
Common Crawl | Web archive for AI |
Allow AI Crawlers¶
export default function robots(): MetadataRoute.Robots {
const aiCrawlers = [
"GPTBot",
"ChatGPT-User",
"Claude-Web",
"Anthropic-AI",
"Google-Extended",
"PerplexityBot",
];
return {
rules: [
{
userAgent: "*",
allow: "/",
},
{
userAgent: aiCrawlers,
allow: ["/", "/llms.txt"],
},
],
sitemap: `${siteConfig.url}/sitemap.xml`,
};
}
Block Specific Crawlers¶
export default function robots(): MetadataRoute.Robots {
return {
rules: [
{
userAgent: "*",
allow: "/",
},
{
// Block specific crawler
userAgent: "BadBot",
disallow: "/",
},
],
sitemap: `${siteConfig.url}/sitemap.xml`,
};
}
Testing and Validation¶
Test Sitemap¶
1. Visit directly:
2. Validate XML:
3. Check in Google Search Console:
- Go to Sitemaps section
- Enter
sitemap.xml - Click Submit
- Check for errors
Test Robots.txt¶
1. Visit directly:
2. Test with Google:
- Google Search Console → robots.txt Tester
- Enter your domain
- Test specific URLs
3. Verify crawlers can access:
Best Practices¶
Sitemap Best Practices¶
- Keep it updated - Regenerate on content changes
- Include all pages - Don't miss important pages
- Accurate priorities - Homepage highest, details lower
- Realistic change frequency - Match actual update schedule
- Submit to Search Console - Help Google discover faster
Robots.txt Best Practices¶
- Allow by default - Only block what's necessary
- Don't block CSS/JS - Search engines need them
- Include sitemap - Help crawlers find it
- Test thoroughly - Ensure no accidental blocks
- Monitor crawl stats - Watch for issues
Common Mistakes¶
❌ Blocking important resources:
This prevents search engines from rendering pages properly.
❌ Forgetting sitemap:
❌ Blocking all AI crawlers:
Prevents AI discovery and recommendations.
See Also¶
- SEO Guide - Complete SEO implementation
- Structured Data - JSON-LD schemas
- LLMs.txt - AI agent discovery
- Analytics - Google Analytics & GTM
Next Steps¶
- Implement sitemap.ts
- Implement robots.ts
- Submit sitemap to Search Console
- Test with robots.txt tester
- Monitor crawl statistics
Last Updated: February 2026
Maintainers: Simon Stijnen
Questions? Open an issue on GitHub