How to¶
Integration with gp-sphinx¶
sphinx_gp_llms ships in DEFAULT_EXTENSIONS,
so projects that build through merge_sphinx_config()
load it automatically. Passing docs_url= to that function auto-derives
the URL input the extension needs:
Auto-derived |
Source |
|---|---|
|
|
When site_url is unset, the extension logs at INFO and skips all
output — no broken builds.
Output formats¶
llms.txt¶
Structured Markdown index following the llmstxt.org specification (Jeremy Howard, Answer.AI). The file gives LLM agents a curated entry point to the site’s content:
H1 — project name
Blockquote — first paragraph of the root document
H2 sections — one per
{toctree}directive with a:caption:option; pages not in any captioned toctree fall into a “Documentation” sectionBulleted links —
[Page Title](full URL): first-paragraph descriptionper page
llms-full.txt¶
Concatenated full-content Markdown of every documentation page,
following the community convention adopted by Anthropic, Cloudflare,
and GitBook. Each page appears under a title header with
a source URL, separated by --- dividers. Source files are included
as-is — MyST pages are already Markdown; RST pages are included
verbatim.
docs.json¶
Agent-oriented manifest following the convention established by Lakebed (Ping). The JSON file provides structured metadata for machine consumption:
agentEntrypoints— pointers to/docs.json,/llms.txt,/llms-full.txtpages[]— flat array withtitle,description,section,url,markdownUrl, andheadings[]per pagesourceRepository— read fromhtml_theme_options["source_repository"]
Per-page .md twins¶
Source file copies alongside each HTML page, following the
“Markdown for Agents” convention (Cloudflare, Stripe, Anthropic,
Vercel). Every HTML page at /path/page.html gets a sibling at
/path/page.md containing the original source content.
How the outputs are built¶
All output files are generated at build-finished in the main
process, iterating app.env.found_docs (the env-merged set of all
documented files). This means:
Incremental builds produce complete output — no pages are missed because Sphinx only re-wrote a subset.
Parallel builds (
sphinx-build -j N) work correctly —found_docsis merged across workers beforebuild-finishedfires.Non-HTML builders (text, json, manpage) are skipped automatically — the handler checks for
get_target_uri.
Footer link injection runs via html-page-context, adding template
variables (llms_md_url, llms_txt_url, etc.) only when the
corresponding llms_generate_* flag is True.
Event hooks¶
build-finished → _write_llm_outputs (llms.txt, llms-full.txt,
docs.json, .md twins)
html-page-context → _inject_llms_context (footer link variables)
Both live in
sphinx_gp_llms/__init__.py.