How to¶
Integration with gp-sphinx¶
sphinx_gp_sitemap ships in DEFAULT_EXTENSIONS,
so projects that build through merge_sphinx_config()
load it automatically. Passing docs_url= to that function auto-derives
both URL inputs the extension needs:
Auto-derived |
Source |
|---|---|
|
|
|
|
The flat scheme overrides the upstream default of
"{lang}{version}{link}" because git-pull.com sites deploy at the
project root, with no language or version directory in the URL space.
Multilingual or version-pinned hosts can still pass an explicit
sitemap_url_scheme through **overrides — merge_sphinx_config()
runs auto-derivation first and overrides last. The canonical mapping
lives in From docs_url.
How sitemap.xml is built¶
After every HTML-family build, the extension serializes one <url>
element per built page to sitemap.xml in the output directory.
Init —
builder-initedinitializesenv.temp_data["sphinx_gp_sitemap_links"]to an empty list.Collect —
html-page-contextfires once per page. The handler computes the relative URL using the builder’s suffix (html_file_suffix or ".html"for thehtmlbuilder;…/fordirhtml, with the index emitted as the empty string), drops it when any pattern insitemap_excludesmatches, and appends a(relative_link, last_updated)tuple to the list.Compose —
build-finishedresolvessite_url(orhtml_baseurlas fallback; if both are unset the build is logged at INFO and skipped silently). For each collected link the handler formatssite_url + sitemap_url_scheme.format(lang=…, version=…, link=…). Thelangsegment comes fromapp.builder.config.languagefollowed by/(empty when no language is set);versionlikewise fromapp.builder.config.version.Hreflang — when
sitemap_localesresolves to a non-empty list (explicit value, or auto-detected sub-directories of every entry inlocale_dirs), each<url>gains<xhtml:link rel="alternate" hreflang="…">siblings. The formatter rewrites underscores to hyphens for IANA compatibility (pt_BR→pt-BR). The sentinelsitemap_locales = [None]suppresses alternates explicitly.Lastmod (optional) — when
sitemap_show_lastmod = True, theconfig-initedhandler runsapp.setup_extension("sphinx_last_updated_by_git")once at the start of the build to lazy-load the supporting extension. If the import fails, sphinx-gp-sitemap logs aWARNINGand disables the flag for the rest of the build —<lastmod>is omitted but everything else still emits.Serialize —
xml.etree.ElementTree.write()produces the file. Whensitemap_indent > 0,ElementTree.indent()pretty-prints the tree with the configured width. ElementTree handles XML entity escaping for the URL text and attribute values automatically.
Event hooks¶
config-inited → _maybe_enable_git_lastmod (lazy-load lastmod ext)
build-finished → _write_sitemap (enumerate found_docs +
XML serialization)
Both live in
sphinx_gp_sitemap/__init__.py.
Page enumeration runs once at build-finished over app.env.found_docs
using app.builder.get_target_uri(pagename) for each URL — no
html-page-context handler, so incremental builds (where Sphinx
fires the hook only for re-written pages) still emit a complete
sitemap. app.env.found_docs is part of the env Sphinx merges across
parallel-read workers, so the extension is parallel_write_safe
without per-handler aggregation logic.
Trade-offs¶
Drop-in for sphinx-sitemap with stricter URL handling. Upstream
reconstructed page URLs as pagename + html_file_suffix, which
diverges from the HTML builder’s actual <a href> output when
html_link_suffix is set (e.g. "/" for clean URLs) or when a
pagename contains characters Sphinx URL-quotes. sphinx-gp-sitemap
calls app.builder.get_target_uri(pagename) directly, matching the
links Sphinx emits on the page itself.
html_baseurl is re-registered defensively. Sphinx core
registers html_baseurl on most modern versions, but older trees and
some custom builders skip it. The setup() body wraps the
add_config_value("html_baseurl", …) call in
contextlib.suppress(ExtensionError) so the extension is robust
against either layout. The bare except BaseException upstream uses
is replaced by the narrow ExtensionError catch.