What llms.txt is for
llms.txt is an emerging convention for publishing machine-readable guidance about how large language models, answer engines, crawlers, and automated tools should interpret a website. It is not a security layer and it is not a legal contract by itself. Its value is communication: it gives crawlers, product teams, and people a clear place to find your preferences.
For a publisher, documentation site, blog, course archive, or small content business, the first useful version can be simple. It can state that public pages may be read and summarized, that short excerpts are acceptable when the source is credited, and that full article republication or removal of author names is not allowed. That clarity helps humans too, because editors, partners, and developers can quickly understand how you want the site referenced.
What to include
A starter llms.txt file should include the site URL, allowed uses, disallowed uses, preferred attribution, and a contact path for permission requests. Keep the language concrete. Instead of broad claims about protection, describe the behavior you want: link to the original page, preserve author names, avoid republishing full articles, and do not misrepresent summaries as original work.
The file should be easy to maintain. If you run a blog, you may keep one policy at the root of the domain. If you run a documentation site with permissive reuse, your policy may say that summaries and short code examples are allowed with source links. If you publish paid course material, your policy can explain that previews and summaries are different from full lesson reproduction.
How it works with robots.txt
robots.txt and llms.txt serve different purposes. robots.txt is a crawler access file that many bots check before fetching URLs. llms.txt is better understood as policy context for AI systems and teams that want to know how a site prefers to be cited or summarized. You can use both: robots.txt for crawl access preferences, llms.txt for attribution and reuse preferences.
Because the convention is still developing, do not assume every crawler will read or obey your file. Treat it as one signal among several: visible attribution text, copyright notices, canonical links, structured metadata, and clear terms pages all reinforce the same message. The stronger your public signals are, the easier it is for responsible systems to credit the right source.
A practical publishing workflow
Generate a first draft, review it against your publishing goals, and place it at the root of your site as /llms.txt. Add a short attribution statement to your footer or article template so humans can also see your preferences. Then update your robots.txt and terms pages only where they match the same policy. Consistency matters more than legal-sounding complexity.
Review the file monthly or quarterly. AI crawler names, answer engine behavior, and content citation conventions change quickly. A short, current policy is usually more useful than a dense policy that nobody maintains. Start with the template, publish the minimum useful version, and improve it as your site grows.
Before you publish
Read the generated file as if you were a crawler operator, a search product manager, and a reader. Each audience should be able to understand the same basic point: public pages can be summarized in limited ways, source links should remain visible, and full republication is outside the intended use. If a sentence sounds impressive but does not guide behavior, simplify it.
After publishing, test the URL directly in a browser and with a plain text fetch. The file should load without redirects to a login page, should use ordinary UTF-8 text, and should not depend on JavaScript. Add a link to the policy from a human-readable page so editors and partners can find the same guidance. When your site adds new sections, revisit the file and make sure the policy still describes your actual content.