llms.txt Guide: The AI-Era Discovery File for Your Website

llms.txt is a proposed convention for a Markdown file at the root of a website that tells AI clients which pages matter, what they are about, and how they fit together. Think of it as a curated table of contents written for language models instead of search crawlers.

The file is not a standard yet in the W3C sense, but it has enough adoption from AI clients and content publishers that it is worth implementing. This guide covers what the file is, the format, what AI crawlers actually do with it, and a worked example you can adapt.

What llms.txt is (and is not)

llms.txt is a single Markdown file served at /llms.txt at the root of your domain. It contains:

A site name and one-line description.
An optional longer summary.
One or more sections, each with a heading and a list of links to pages on your site.
Each link includes a title and an optional one-line description.

It is not a replacement for sitemap.xml. Sitemap.xml lists every indexable URL for search crawlers. llms.txt lists the URLs you want AI clients to focus on, organized in a way the model can reason about.

It is not a robots directive. Use robots.txt and the AI crawler-specific user agents (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) to allow or block crawling. llms.txt is downstream of that decision; it presumes the crawler is allowed in.

It is not indexed by Google as a ranking signal. It is read by AI clients during retrieval, summarization, and citation tasks.

The format

A minimal llms.txt looks like this.

```markdown # Site Name

One-line description of what the site is and who it is for.

Optional longer paragraph with context: the company, the focus areas, the audience.

Section name

Page title: one-line description of the page
Another page: one-line description

Another section

- Page title: description ```

That is the whole format. Headings are sections. Each section has a list. Each list item is a Markdown link with an optional description after a dash.

What AI crawlers actually do with it

This is where most write-ups get vague. Concretely, AI clients use llms.txt in several ways.

Discovery. When the client decides to read a site for a query, it can fetch llms.txt first to learn which pages are likely relevant before crawling the full site. This is much cheaper than parsing sitemap.xml plus 200 HTML pages.

Citation prioritization. When the model has multiple candidate pages, the description in llms.txt influences which one it cites. A clean, accurate description means the right page gets the citation.

Section understanding. The headings in llms.txt give the model a map of how the site is organized. "Solutions / Expertise / Blog / Case Studies" tells the model what kind of content lives where.

Companion file (llms-full.txt). Many sites also publish a /llms-full.txt that contains the full text of the linked pages concatenated, formatted for model consumption. This lets a client read the entire content surface in one fetch. If you publish llms-full.txt, keep it under a few megabytes; oversized files get truncated.

A worked example

Here is a real example from CloudNSite. It is intentionally short and curated, not a dump of every page.

```markdown # CloudNSite

CloudNSite builds and operates custom AI agents, RAG systems, MCP servers, and workflow automation for regulated businesses.

We work with healthcare, legal, financial services, and operations teams that need AI that ships, audits cleanly, and stays running after launch.

Expertise

MCP server development: Transport, identity, tool design, and ops for production Model Context Protocol servers.
AI governance framework: NIST AI RMF and ISO 42001 implementation for regulated AI deployments.
Generative engine optimization: Make your site visible and cite-worthy to AI search clients.

Solutions

Custom AI agents: Workflow-execution agents with tools, evaluation, and governance.
RAG implementation: Production retrieval-augmented generation with hybrid retrieval and citation enforcement.
AI voice agents: Outbound and inbound voice agents for scheduling, qualification, and follow-up.
AI for accounts payable: Invoice ingestion, GL coding, PO matching, and approval routing.

Approach

Custom AI builds: How CloudNSite scopes, builds, and operates custom AI systems.

Discovery

- llms-full.txt: Full text of priority pages, formatted for AI clients. - ai-search.json: Structured Q&A index for AI search retrieval. ```

A few things to notice. The descriptions are direct, not marketing copy. Each link gets one line, not a paragraph. The sections match the site's actual information architecture. The discovery section points to the companion files.

How to write good descriptions

The description after each link is the lever. A few rules.

State what the page is, not what it sells. "Workflow-execution agents with tools, evaluation, and governance" beats "Transform your business with cutting-edge AI."

Use the actual terms users search. If users ask "what is an MCP server," the description should contain those words.

Stay under 15 words. The model has many pages to choose from. A long description gets summarized away.

No em dashes. Use a colon or two sentences. Em dashes are an AI-slop tell that some clients filter against.

Common mistakes

A few patterns we see go wrong.

Listing every page. llms.txt is curated. If you list 400 URLs, the file becomes noise. Pick the 30 to 60 pages you actually want AI clients to focus on.

Marketing-voice descriptions. "Industry-leading, AI-powered, end-to-end transformation" tells the model nothing. The model is choosing between concrete sources. Be concrete.

Stale URLs. If the URLs in llms.txt 404, your AI presence quietly degrades. Treat llms.txt like a manifest and regenerate it from your content source on every deploy.

Mismatched headings. If your site has Services, Expertise, and Insights, do not call them Products, Articles, and About in llms.txt. The model trusts the file to describe the site accurately.

How to maintain it

The cleanest pattern is to generate llms.txt from the same content source that produces your sitemap. Every deploy regenerates the file. URLs stay fresh. Descriptions stay aligned with the actual page metadata.

If your stack is Next.js, Vite, Astro, or any static site generator, this is a single build step. If your stack is a CMS, it is a small export script. Either way, do not maintain llms.txt by hand. It will drift.

Where to go next

The full pillar on AI-era discovery (llms.txt, llms-full.txt, ai-search.json, structured data, citation hooks, measurement) is in generative engine optimization. The CloudNSite production file is at /llms.txt if you want to see the full version.

If you want CloudNSite to audit your current GEO surface and ship the file alongside the rest of the discovery layer, the engagement starts with a content inventory and ends with a regeneration pipeline you keep using.

llms.txt Guide: What It Is, How to Write One, and Why It Matters

What llms.txt is (and is not)

The format

Section name

Another section

What AI crawlers actually do with it

A worked example

Expertise

Solutions

Approach

Discovery

How to write good descriptions

Common mistakes

How to maintain it

Where to go next

Need Help with GEO?

Solutions for this work

Custom AI Agents

Private AI Deployment

Sales AI Automation

Consulting for this category

SaaS Consulting

Healthcare Consulting

Decision Guides

How to Switch from Manual Workflows to AI Agents

Alternatives to Generic Chatbots for Business Operations

Best AI Agents for Small Medical Practices