// GEO

    llms.txt Guide: What It Is, How to Write One, and Why It Matters

    llms.txt is a Markdown discovery file that tells AI clients which pages on your site matter and how they fit together. Here is the format and a real example.

    CloudNSite Team
    May 26, 2026
    8 min read

    llms.txt is a proposed convention for a Markdown file at the root of a website that tells AI clients which pages matter, what they are about, and how they fit together. Think of it as a curated table of contents written for language models instead of search crawlers.

    The file is not a standard yet in the W3C sense, but it has enough adoption from AI clients and content publishers that it is worth implementing. This guide covers what the file is, the format, what AI crawlers actually do with it, and a worked example you can adapt.

    What llms.txt is (and is not)

    llms.txt is a single Markdown file served at /llms.txt at the root of your domain. It contains:

    • A site name and one-line description.
    • An optional longer summary.
    • One or more sections, each with a heading and a list of links to pages on your site.
    • Each link includes a title and an optional one-line description.

    It is not a replacement for sitemap.xml. Sitemap.xml lists every indexable URL for search crawlers. llms.txt lists the URLs you want AI clients to focus on, organized in a way the model can reason about.

    It is not a robots directive. Use robots.txt and the AI crawler-specific user agents (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) to allow or block crawling. llms.txt is downstream of that decision; it presumes the crawler is allowed in.

    It is not indexed by Google as a ranking signal. It is read by AI clients during retrieval, summarization, and citation tasks.

    The format

    A minimal llms.txt looks like this.

    ```markdown # Site Name

    One-line description of what the site is and who it is for.

    Optional longer paragraph with context: the company, the focus areas, the audience.

    Section name

    Another section

    - Page title: description ```

    That is the whole format. Headings are sections. Each section has a list. Each list item is a Markdown link with an optional description after a dash.

    What AI crawlers actually do with it

    This is where most write-ups get vague. Concretely, AI clients use llms.txt in several ways.

    Discovery. When the client decides to read a site for a query, it can fetch llms.txt first to learn which pages are likely relevant before crawling the full site. This is much cheaper than parsing sitemap.xml plus 200 HTML pages.

    Citation prioritization. When the model has multiple candidate pages, the description in llms.txt influences which one it cites. A clean, accurate description means the right page gets the citation.

    Section understanding. The headings in llms.txt give the model a map of how the site is organized. "Solutions / Expertise / Blog / Case Studies" tells the model what kind of content lives where.

    Companion file (llms-full.txt). Many sites also publish a /llms-full.txt that contains the full text of the linked pages concatenated, formatted for model consumption. This lets a client read the entire content surface in one fetch. If you publish llms-full.txt, keep it under a few megabytes; oversized files get truncated.

    A worked example

    Here is a real example from CloudNSite. It is intentionally short and curated, not a dump of every page.

    ```markdown # CloudNSite

    CloudNSite builds and operates custom AI agents, RAG systems, MCP servers, and workflow automation for regulated businesses.

    We work with healthcare, legal, financial services, and operations teams that need AI that ships, audits cleanly, and stays running after launch.

    Expertise

    Solutions

    • Custom AI agents: Workflow-execution agents with tools, evaluation, and governance.
    • RAG implementation: Production retrieval-augmented generation with hybrid retrieval and citation enforcement.
    • AI voice agents: Outbound and inbound voice agents for scheduling, qualification, and follow-up.
    • AI for accounts payable: Invoice ingestion, GL coding, PO matching, and approval routing.

    Approach

    • Custom AI builds: How CloudNSite scopes, builds, and operates custom AI systems.

    Discovery

    - llms-full.txt: Full text of priority pages, formatted for AI clients. - ai-search.json: Structured Q&A index for AI search retrieval. ```

    A few things to notice. The descriptions are direct, not marketing copy. Each link gets one line, not a paragraph. The sections match the site's actual information architecture. The discovery section points to the companion files.

    How to write good descriptions

    The description after each link is the lever. A few rules.

    State what the page is, not what it sells. "Workflow-execution agents with tools, evaluation, and governance" beats "Transform your business with cutting-edge AI."

    Use the actual terms users search. If users ask "what is an MCP server," the description should contain those words.

    Stay under 15 words. The model has many pages to choose from. A long description gets summarized away.

    No em dashes. Use a colon or two sentences. Em dashes are an AI-slop tell that some clients filter against.

    Common mistakes

    A few patterns we see go wrong.

    Listing every page. llms.txt is curated. If you list 400 URLs, the file becomes noise. Pick the 30 to 60 pages you actually want AI clients to focus on.

    Marketing-voice descriptions. "Industry-leading, AI-powered, end-to-end transformation" tells the model nothing. The model is choosing between concrete sources. Be concrete.

    Stale URLs. If the URLs in llms.txt 404, your AI presence quietly degrades. Treat llms.txt like a manifest and regenerate it from your content source on every deploy.

    Mismatched headings. If your site has Services, Expertise, and Insights, do not call them Products, Articles, and About in llms.txt. The model trusts the file to describe the site accurately.

    How to maintain it

    The cleanest pattern is to generate llms.txt from the same content source that produces your sitemap. Every deploy regenerates the file. URLs stay fresh. Descriptions stay aligned with the actual page metadata.

    If your stack is Next.js, Vite, Astro, or any static site generator, this is a single build step. If your stack is a CMS, it is a small export script. Either way, do not maintain llms.txt by hand. It will drift.

    Where to go next

    The full pillar on AI-era discovery (llms.txt, llms-full.txt, ai-search.json, structured data, citation hooks, measurement) is in generative engine optimization. The CloudNSite production file is at /llms.txt if you want to see the full version.

    If you want CloudNSite to audit your current GEO surface and ship the file alongside the rest of the discovery layer, the engagement starts with a content inventory and ends with a regeneration pipeline you keep using.

    LET'S BUILD

    Need Help with GEO?

    Our team can help you implement the strategies discussed in this article.