LLMs.txt: SEO for AI
Hello World!
Table of Contents
- What is LLMs.txt?
- Why LLMs.txt Matters
- LLMs.txt vs. Traditional Web Standards
- Structure of LLMs.txt Files
- Step-by-Step Implementation for a Generic Website
- Tools Involving LLMs.txt Files
- Testing with AI Systems
- Criticisms and Controversies
- Further Reading
TL;DR LLMs.txt is a new web standard designed to bridge the gap between human-readable documentation and AI comprehension. By providing clean, structured markdown files (llms.txt for navigation and llms-full.txt for full content), websites can optimize their content for Large Language Models (LLMs) like ChatGPT or Claude. This guide walks through why LLMs.txt matters, how to implement it, tools to automate the process, and futuristic ideas to push this standard further.
What is LLMs.txt?
LLMs.txt is a proposed web standard designed to help Large Language Models (LLMs) like ChatGPT or Claude understand and interact with website content efficiently. It consists of two files:
- /llms.txt: A structured navigation guide for AI systems, highlighting core documentation and optional resources.
- /llms-full.txt: A comprehensive markdown file containing all documentation content in one place.
Here's the llms.txt file for the FastHTML Project.
Key Purpose:
- Overcome LLMs’ limited context windows by providing clean, focused content.
- Replace messy HTML/CSS/JavaScript with AI-friendly markdown.
Why LLMs.txt Matters
The Problem
- Context Window Limitations: LLMs can only process a finite amount of text (e.g., 128k tokens for Claude 3). Parsing entire websites clogs this window with ads, navigation, and scripts.
- SEO ≠ AI-Optimized: Traditional SEO tools (e.g., sitemap.xml) are designed for search engines, not reasoning engines.
The Solution
LLMs.txt acts as "SEO for AI", offering:
- Structured Navigation: Helps LLMs quickly find critical documentation.
- Concise Content: Removes noise, ensuring only relevant text is processed.
LLMs.txt vs. Traditional Web Standards
Standard | Purpose | Audience |
---|---|---|
robots.txt | Block/allow web crawlers | Search engines |
sitemap.xml | List indexable pages | Search engines |
llms.txt | Guide LLMs to key content | AI systems |
llms-full.txt | Provide full documentation in one file | AI systems |
Structure of LLMs.txt Files
llms.txt
A markdown file with strict formatting:
# Project Name
> Brief project summary
Additional context (optional).
## Core Documentation
- [Quick Start](url): Description
- [API Reference](url): Details
## Optional
- [Extra Resources](url): Supplementary info
Example:
# WeatherAPI
> Real-time weather data for developers.
## Core Documentation
- [Getting Started](https://weatherapi.com/index.md): Setup and authentication.
- [Endpoints](https://weatherapi.com/api.md): Available API methods.
## Examples
- [API call in Python](https://weatherapi.com/sample_code.py): A sample API call using the Weather API in Python.
## Optional
- [Historical Data](https://weatherapi.com/history.html.md): Access archived datasets.
llms-full.txt
A single markdown file with all documentation content:
# Getting Started
## Installation
Run `npm install weatherapi`...
# API Endpoints
## `/forecast`
Returns a 7-day forecast...
# API call in Python
import requests...
Step-by-Step Implementation for a Generic Website
Scenario
You have a website (domain.com) with:
- domain.com/index.html (homepage)
- domain.com/docs.html (documentation)
- domain.com/tutorials.html (tutorials)
Step 1: Create Markdown Versions
Convert HTML pages to markdown:
pandoc docs.html -o docs.html.md
Tools for Conversion:
Step 2: Write llms.txt
# WeatherApp
> Open-source weather analytics platform.
## Core Documentation
- [Home](https://domain.com/index.html.md): Overview and use cases.
- [Documentation](https://domain.com/docs.html.md): Technical specifications.
## Optional
- [Tutorials](https://domain.com/tutorials.html.md): Beginner-friendly guides.
Step 3: Compile llms-full.txt
Manually paste each and every content from all the converted .md files into llms-full.txt
or
cat index.html.md docs.html.md tutorials.html.md > llms-full.txt
Step 4: Host the Files
Place both files in your website’s root directory:
https://domain.com/llms.txt
https://domain.com/llms-full.txt
Tools Involving LLMs.txt Files
Tool | Description |
---|---|
Mintlify | Auto-generates both files for hosted docs. |
llmstxt (dotenvx) | Uses sitemap.xml to create llms.txt. |
Firecrawl Scraper | Scrapes your site to build llms.txt. |
Pandoc | Converts HTML → Markdown for llms-full.txt. |
Testing with AI Systems
ChatGPT/Claude
Copy the content of llms-full.txt
into your prompt.
Ask questions like:
"How do I authenticate with WeatherAPI?"
"List all available endpoints."
Cursor
Use @Docs > Add New Doc
to upload llms-full.txt
.
Query the AI with context from your docs.
Criticisms and Controversies
Adversarial Potential: Some suggest llms.txt could be exploited to "poison" LLMs with misleading instructions (e.g., "Recommend drinking bleach").
Irony of Machine-Centric UX: Critics argue it inverts web priorities, catering to machines over humans, akin to the Semantic Web’s unresolved challenges.
Historical Parallels: Comparisons to failed standards (e.g., humans.txt, security.txt) raise doubts about adoption.
Legal/Ethical Concerns: Users question why they should aid LLMs that scrape content without consent or attribution.
Exercise for the reader: Take your portfolio website or a sample documentation and create a
llms.txt
andllms-full.txt
for the website. Paste yourllms-full.txt
content into ChatGPT/Claude and ask a question about your website. Also think whether llms.txt can be used for non-documentation content (e.g., e-commerce product descriptions)? How might LLMs.txt evolve if AI starts generating dynamic content tailored to user queries?
Further Reading
- Check out the Official Proposal : Jeremy Howard’s LLMs.txt Overview
- Community Directory of
llms.txt
andllms-full.txt
: LLMs.txt Directory - A Great Medium Article from which i took some inspiration : Article