Auto-generate llms.txt from sidebars + frontmatter#1203
Open
dkijania wants to merge 1 commit into
Open
Conversation
The hand-maintained static/llms.txt has been drifting since May 5: it predates PR #1192's exchange/node-operator restructure, references moved URLs, and is missing operator-facing facts that integrators search for. AI agents that fetch llms.txt as a discovery layer are seeing a stale view of the docs. Add scripts/generate-llms-index.mjs that produces llms.txt the same way scripts/generate-llms-txt.mjs produces llms-full.txt — auto-built from canonical sources on every build. Wire it into the build, the generate-llms-* npm scripts, and the check-llms-txt CI gate so neither file can drift again. The generator: - Walks sidebars.js for hierarchy (canonical TOC source of truth) - Reads each .mdx's frontmatter for title and description - Skips pages with no `description`, prints a warning listing them so authors can fill in what's missing (219 pages today, mostly the auto-generated o1js-reference subpages — those don't belong in a discovery-layer index anyway) - Groups output by top-level sidebar category (audience-grouped: Network Upgrades / zkApp Developers / Mina Protocol / Node Operators / Exchange Operators / Developer Tools / Mina Security) - Skips top-level "Participate" since it's community / process content not actionable for AI agents - Appends an "Operator-facing facts" section that surfaces the high-signal exchange-FAQ specifics (mempool 3000, account creation fee 1 MINA, 15-block confirmation, GraphQL port 3085) which are buried inside FAQ pages that the model otherwise misses Output is 24 KB / 123 pages / 8 sections — slightly above the recommended llms.txt budget but workable. A follow-up could trim deep tutorial subpages (Berkeley archive migration walkthrough has ~12 entries that probably belong in llms-full.txt only). The new check-llms-txt now gates both files: any drift in either file fails CI, mirroring the existing gate for llms-full.txt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
20 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
`static/llms.txt` has been hand-maintained and drifting since May 5. PR #1192 restructured the docs (exchange-operators → node-operators, new Rosetta layout, signer page consolidation), but `llms.txt` still reflects the pre-#1192 hierarchy. AI agents that fetch `llms.txt` as a discovery layer (per llmstxt.org) are seeing a stale view.
This was directly visible in the AI benchmark from #1202: the `llms` source scored 84.3% with several specific gaps (mempool size, transaction confirmation count, default GraphQL port) that do exist in the docs but didn't make it into the hand-curated index.
What this changes
Generator design
Output stats
Why this matters for AI discoverability (#1195)
`llms.txt` is the first thing AI agents fetch when they want to answer a Mina question. A stale or sparse `llms.txt` means agents miss real docs and fall back to training data — which is fine for evergreen facts but breaks down for anything operator-facing or recently changed. Auto-generating it removes the staleness class of failure entirely.
Test plan
🤖 Generated with Claude Code