Sitemap and Robots.txt Configuration¶

Breadcrumbs: Documentation > Guides > SEO > Sitemap & Robots

This guide explains the dynamic sitemap generation and robots.txt configuration for search engines and AI crawlers.

Table of Contents¶

Sitemap Implementation
Robots.txt Implementation
AI Crawler Configuration
Testing and Validation
Best Practices

Sitemap Implementation¶

What is a Sitemap?¶

An XML file listing all pages on your site, helping search engines:

Discover all pages
Understand site structure
Determine crawl priority
Know update frequency

File Location¶

app/sitemap.ts

Basic Implementation¶

// app/sitemap.ts
import { MetadataRoute } from "next";
import { siteConfig } from "@/lib/config";

export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
  const baseUrl = siteConfig.url;

  return [
    {
      url: baseUrl,
      lastModified: new Date(),
      changeFrequency: "weekly",
      priority: 1,
    },
    {
      url: `${baseUrl}/projects`,
      lastModified: new Date(),
      changeFrequency: "weekly",
      priority: 0.9,
    },
  ];
}

Dynamic Routes¶

Include all projects:

import { getProjects } from "@/lib/projects";

export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
  const baseUrl = siteConfig.url;
  const projects = await getProjects();

  const projectRoutes = projects.map((project) => ({
    url: `${baseUrl}/projects/${project.slug}`,
    lastModified: new Date(),
    changeFrequency: "monthly" as const,
    priority: 0.8,
  }));

  return [
    // Static routes
    {
      url: baseUrl,
      lastModified: new Date(),
      changeFrequency: "weekly",
      priority: 1,
    },
    // Dynamic project routes
    ...projectRoutes,
  ];
}

Priority Guidelines¶

Page Type	Priority	Change Frequency
Homepage	1.0	weekly
Main sections	0.9	weekly
Project details	0.8	monthly
About sections	0.7	monthly
Downloads	0.6	monthly
AI endpoints	0.5	monthly

Access¶

URL: https://your-domain.com/sitemap.xml

Output format:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://simon.stijnen.be</loc>
    <lastmod>2026-02-09T12:00:00.000Z</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1</priority>
  </url>
  <url>
    <loc>https://simon.stijnen.be/projects</loc>
    <lastmod>2026-02-09T12:00:00.000Z</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.9</priority>
  </url>
</urlset>

Robots.txt Implementation¶

What is robots.txt?¶

A file that tells search engines:

Which pages to crawl
Which pages to skip
Where the sitemap is located
Crawl rate limits (optional)

File Location¶

app/robots.ts

Basic Implementation¶

// app/robots.ts
import type { MetadataRoute } from "next";
import { siteConfig } from "@/lib/config";

export default function robots(): MetadataRoute.Robots {
  return {
    rules: {
      userAgent: "*",
      allow: "/",
      disallow: "/private/",
    },
    sitemap: `${siteConfig.url}/sitemap.xml`,
    host: siteConfig.url,
  };
}

Multiple Rules¶

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      {
        userAgent: "*",
        allow: "/",
        disallow: ["/admin/", "/api/private/"],
      },
      {
        userAgent: "GPTBot",
        allow: ["/llms.txt", "/"],
      },
    ],
    sitemap: `${siteConfig.url}/sitemap.xml`,
    host: siteConfig.url,
  };
}

Access¶

URL: https://your-domain.com/robots.txt

Output format:

User-Agent: *
Allow: /
Disallow: /admin/

User-Agent: GPTBot
Allow: /llms.txt
Allow: /

Sitemap: https://simon.stijnen.be/sitemap.xml
Host: https://simon.stijnen.be

AI Crawler Configuration¶

Supported AI Crawlers¶

List of known AI crawlers:

Crawler	Owner	Purpose
`GPTBot`	OpenAI	ChatGPT training
`ChatGPT-User`	OpenAI	ChatGPT browsing
`Claude-Web`	Anthropic	Claude web search
`Anthropic-AI`	Anthropic	AI training
`Google-Extended`	Google	Bard/Gemini training
`PerplexityBot`	Perplexity	Perplexity search
`Bytespider`	ByteDance	TikTok AI
`CCBot`	Common Crawl	Web archive for AI

Allow AI Crawlers¶

export default function robots(): MetadataRoute.Robots {
  const aiCrawlers = [
    "GPTBot",
    "ChatGPT-User",
    "Claude-Web",
    "Anthropic-AI",
    "Google-Extended",
    "PerplexityBot",
  ];

  return {
    rules: [
      {
        userAgent: "*",
        allow: "/",
      },
      {
        userAgent: aiCrawlers,
        allow: ["/", "/llms.txt"],
      },
    ],
    sitemap: `${siteConfig.url}/sitemap.xml`,
  };
}

Block Specific Crawlers¶

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      {
        userAgent: "*",
        allow: "/",
      },
      {
        // Block specific crawler
        userAgent: "BadBot",
        disallow: "/",
      },
    ],
    sitemap: `${siteConfig.url}/sitemap.xml`,
  };
}

Testing and Validation¶

Test Sitemap¶

1. Visit directly:

https://your-domain.com/sitemap.xml

2. Validate XML:

curl https://your-domain.com/sitemap.xml | xmllint --format -

3. Check in Google Search Console:

Go to Sitemaps section
Enter sitemap.xml
Click Submit
Check for errors

Test Robots.txt¶

1. Visit directly:

https://your-domain.com/robots.txt

2. Test with Google:

Google Search Console → robots.txt Tester
Enter your domain
Test specific URLs

3. Verify crawlers can access:

curl https://your-domain.com/robots.txt

Best Practices¶

Sitemap Best Practices¶

Keep it updated - Regenerate on content changes
Include all pages - Don't miss important pages
Accurate priorities - Homepage highest, details lower
Realistic change frequency - Match actual update schedule
Submit to Search Console - Help Google discover faster

Robots.txt Best Practices¶

Allow by default - Only block what's necessary
Don't block CSS/JS - Search engines need them
Include sitemap - Help crawlers find it
Test thoroughly - Ensure no accidental blocks
Monitor crawl stats - Watch for issues

Common Mistakes¶

❌ Blocking important resources:

User-Agent: *
Disallow: /css/
Disallow: /js/

This prevents search engines from rendering pages properly.

❌ Forgetting sitemap:

User-Agent: *
Allow: /
# Missing: Sitemap: ...

❌ Blocking all AI crawlers:

User-Agent: GPTBot
Disallow: /

Prevents AI discovery and recommendations.

Next Steps¶

Implement sitemap.ts
Implement robots.ts
Submit sitemap to Search Console
Test with robots.txt tester
Monitor crawl statistics

Last Updated: February 2026
Maintainers: Simon Stijnen
Questions? Open an issue on GitHub

Sitemap and Robots.txt Configuration¶

Table of Contents¶

Sitemap Implementation¶

What is a Sitemap?¶

File Location¶

Basic Implementation¶

Dynamic Routes¶

Priority Guidelines¶

Access¶

Robots.txt Implementation¶

What is robots.txt?¶

File Location¶

Basic Implementation¶

Multiple Rules¶

Access¶

AI Crawler Configuration¶

Supported AI Crawlers¶

Allow AI Crawlers¶

Block Specific Crawlers¶

Testing and Validation¶

Test Sitemap¶

Test Robots.txt¶

Best Practices¶

Sitemap Best Practices¶

Robots.txt Best Practices¶

Common Mistakes¶

See Also¶

Next Steps¶