Auto-generate llms.txt from sidebars + frontmatter by dkijania · Pull Request #1203 · MinaProtocol/docs2

dkijania · 2026-05-10T10:39:59Z

Summary

`static/llms.txt` has been hand-maintained and drifting since May 5. PR #1192 restructured the docs (exchange-operators → node-operators, new Rosetta layout, signer page consolidation), but `llms.txt` still reflects the pre-#1192 hierarchy. AI agents that fetch `llms.txt` as a discovery layer (per llmstxt.org) are seeing a stale view.

This was directly visible in the AI benchmark from #1202: the `llms` source scored 84.3% with several specific gaps (mempool size, transaction confirmation count, default GraphQL port) that do exist in the docs but didn't make it into the hand-curated index.

What this changes

New `scripts/generate-llms-index.mjs` produces `static/llms.txt` from the same source of truth the docs site uses:
- Hierarchy ← `sidebars.js`
- Title ← `frontmatter.title`
- Description ← `frontmatter.description`
Wired into `build`, `generate-llms-*`, and `check-llms-txt` npm scripts so neither file can drift again
The CI gate now checks both `llms.txt` and `llms-full.txt`

Generator design

Decision	Reason
Skip pages with no `description` frontmatter (219 today)	Quality gate — mostly o1js-reference auto-generated subpages that don't belong in a discovery-layer index. Generator prints them as warnings so authors can fill in what's worth surfacing
Audience-grouped top-level sections	Mirrors how AI agents triage queries (zkApp dev vs node operator vs exchange)
Skip "Participate" top-level entry	Community / process content, not actionable for AI agents
Append "Operator-facing facts" section	Hard-coded callouts for the high-signal exchange-FAQ specifics (mempool 3000, account creation fee 1 MINA, 15-block confirmation, GraphQL port 3085) that AI agents miss because they're buried inside FAQ anchor sections

Output stats

123 pages indexed across 8 sections
24 KB — slightly above the typical 5-10 KB llms.txt budget. A follow-up could trim deep tutorial subpages (Berkeley archive migration has ~12 entries that arguably belong in llms-full.txt only)
219 pages skipped, listed as warnings during build

Why this matters for AI discoverability (#1195)

`llms.txt` is the first thing AI agents fetch when they want to answer a Mina question. A stale or sparse `llms.txt` means agents miss real docs and fall back to training data — which is fine for evergreen facts but breaks down for anything operator-facing or recently changed. Auto-generating it removes the staleness class of failure entirely.

Test plan

`npm run generate-llms-index` produces a clean diff
`npm run check-llms-txt` passes locally
Spot-check 5 random URLs in the new `llms.txt` resolve 200 on docs.minaprotocol.com
After merge: rerun the AI benchmark (`gh workflow run benchmark-llms-docs.yml --repo MinaProtocol/docs2`) and confirm the `llms` source score moves on the affected questions (f2 confirmation, f4 mempool, f9 port — were missing in old llms.txt)
Confirm CI's existing `check-llms-txt` gate fails as expected when a doc's frontmatter changes without regeneration

🤖 Generated with Claude Code

The hand-maintained static/llms.txt has been drifting since May 5: it predates PR #1192's exchange/node-operator restructure, references moved URLs, and is missing operator-facing facts that integrators search for. AI agents that fetch llms.txt as a discovery layer are seeing a stale view of the docs. Add scripts/generate-llms-index.mjs that produces llms.txt the same way scripts/generate-llms-txt.mjs produces llms-full.txt — auto-built from canonical sources on every build. Wire it into the build, the generate-llms-* npm scripts, and the check-llms-txt CI gate so neither file can drift again. The generator: - Walks sidebars.js for hierarchy (canonical TOC source of truth) - Reads each .mdx's frontmatter for title and description - Skips pages with no `description`, prints a warning listing them so authors can fill in what's missing (219 pages today, mostly the auto-generated o1js-reference subpages — those don't belong in a discovery-layer index anyway) - Groups output by top-level sidebar category (audience-grouped: Network Upgrades / zkApp Developers / Mina Protocol / Node Operators / Exchange Operators / Developer Tools / Mina Security) - Skips top-level "Participate" since it's community / process content not actionable for AI agents - Appends an "Operator-facing facts" section that surfaces the high-signal exchange-FAQ specifics (mempool 3000, account creation fee 1 MINA, 15-block confirmation, GraphQL port 3085) which are buried inside FAQ pages that the model otherwise misses Output is 24 KB / 123 pages / 8 sections — slightly above the recommended llms.txt budget but workable. A follow-up could trim deep tutorial subpages (Berkeley archive migration walkthrough has ~12 entries that probably belong in llms-full.txt only). The new check-llms-txt now gates both files: any drift in either file fails CI, mirroring the existing gate for llms-full.txt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-10T10:40:01Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
docs2	Ready	Preview, Comment	May 10, 2026 10:40am

vercel Bot deployed to Preview – docs2 May 10, 2026 10:40 View deployment

dkijania mentioned this pull request May 10, 2026

Add frontmatter description to ~20 high-value docs missing them #1204

Open

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-generate llms.txt from sidebars + frontmatter#1203

Auto-generate llms.txt from sidebars + frontmatter#1203
dkijania wants to merge 1 commit into
mainfrom
dkijania/auto-generate-llms-index

dkijania commented May 10, 2026

Uh oh!

vercel Bot commented May 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dkijania commented May 10, 2026

Summary

What this changes

Generator design

Output stats

Why this matters for AI discoverability (#1195)

Test plan

Uh oh!

vercel Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 10, 2026 •

edited

Loading