Skip to content

docs: convert reStructuredText sources to MyST markdown#1579

Open
timsaucer wants to merge 12 commits into
apache:mainfrom
timsaucer:doc/phase2-rst-to-md
Open

docs: convert reStructuredText sources to MyST markdown#1579
timsaucer wants to merge 12 commits into
apache:mainfrom
timsaucer:doc/phase2-rst-to-md

Conversation

@timsaucer
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #.

Rationale for this change

Phase 2 of the documentation-site refresh started in #1578. With the
modern pydata-sphinx-theme + navigation in place, this PR moves the
content format off .rst and onto MyST .md. The motivation:

  • Markdown is the lingua franca of agent-tuned tooling. LLMs trained
    on GitHub and modern docs parse Markdown reliably; reStructuredText
    is a minority dialect that frequently confuses both humans editing
    via PR review and agents reading the source. The Apache
    datafusion-comet sibling project completed the same migration
    recently and reported smoother contributor onboarding.
  • MyST is a strict superset of CommonMark with directives for the
    Sphinx features we actually use (toctrees, cross-references,
    code-blocks, admonitions, eval-rst escape hatch).
  • The myst-parser extension is already in the docs dependency
    group and was loaded by conf.py even before this PR — switching
    the on-disk format is a low-risk, mechanical change.

This PR stacks on #1578 (theme + navbar refresh). It should land
after #1578.

What changes are included in this PR?

Format conversion (mechanical, via rst-to-myst):

  • 33 human-authored .rst files under docs/source/ become 33
    .md files — the user guide, contributor guide, IO subsection,
    common-operations subsection, dataframe subsection, top-level
    index, and links.
  • Toctrees, cross-references, code blocks, hyperlinks, admonitions,
    and license headers all round-trip cleanly.

Manual fixes layered on top of the converter output:

  • Cross-reference anchors. The converter kebab-cased every
    (label)= anchor (e.g. (io-csv)=), but every {ref} in the
    corpus — including the Python docstrings that sphinx-autoapi
    pulls into the API reference — still uses the underscore form
    ({ref}\CSV <io_csv>`). Rewrite the anchors back to underscore form ((io_csv)=, (window_functions)=, (user_guide_concepts)=, (execution_metrics)=`, etc.) so existing references resolve
    without churning every callsite.
  • MyST extensions. Enable colon_fence and deflist in
    myst_enable_extensions (the converter emits these on a few
    files, notably dataframe/execution-metrics.md).
  • source_suffix. Keep .rst registered even though no
    human-authored RST remains: sphinx-autoapi generates .rst
    under autoapi/ at build time and Sphinx needs the suffix to
    parse it. The comment in conf.py flags this so a future cleanup
    pass doesn't strip it again.

86 {eval-rst} blocks remain in the converted output. Every one of
them wraps a .. ipython:: directive, which has no first-class MyST
equivalent in our extensions setup. The blocks render identically
and don't block the build. Migrating these to a native MyST exec
syntax is a follow-up that requires either myst-nb or a custom
parser registration — out of scope here.

AGENTS.md is updated so the two .rst paths called out under
"Aggregate and Window Function Documentation" point at the new .md
equivalents.

Are there any user-facing changes?

No behavioral change to the datafusion package — only the source
format of the published documentation. Readers of the rendered site
will not notice the migration; the HTML output is unchanged. Internal
cross-references resolve, the pokemon.csv ipython example on the
landing page and the yellow_tripdata_2021-01.parquet example on
the basics page both still execute.

No api change label — public APIs untouched.

Follow-ups (out of scope for this PR)

  • Migrate the 86 {eval-rst} .. ipython:: blocks to a
    MyST-native exec syntax. Requires either pulling in myst-nb or
    configuring a per-language parser.
  • Phase 3: multi-version doc publishing (the comet pattern).
  • Phase 4: asf-site publishing workflow.

timsaucer and others added 12 commits June 5, 2026 08:46
Bump pydata-sphinx-theme 0.8.0 -> 0.16 to enable the modern navbar slot
API and dark/light theme switcher. Configure top navbar with logo,
nav links, GitHub icon, and theme switcher in conf.py. Drop the custom
docs-sidebar.html override and the layout.html block that silenced the
navbar — both predate the slot API and conflict with the new theme.
Strip CSS overrides that fought the old theme (--pst-header-height: 0,
navbar-brand sizing) and add a dark-mode variant for the inline code
color and table-stripe shading. Fix the stale github_repo
("arrow-datafusion-python" -> "datafusion-python") so future Edit-on-
GitHub links resolve. Bump copyright year and project name.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous structure dumped every top-level toctree entry from index.rst
into the navbar, producing eight items including external URLs ("Github
and Issue Tracker", "Rust's API Docs", ...) that wrapped to two lines
each. Introduce user-guide/index.rst and contributor-guide/index.rst as
section landing pages with nested toctrees, then point index.rst at just
those two plus autoapi/index. The navbar now reads "User Guide",
"Contributor Guide", "API Reference" — three single-line entries. Move
the external links into the index.rst body where they're discoverable
without crowding navigation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add Examples and Rust API as text links in the top navbar via the
pydata-sphinx-theme external_links option. Nest the code-of-conduct
link inside the Contributor Guide toctree so it appears alongside the
other contributor pages. Drop the duplicate "Further reading" bullet
list from the landing page now that every link has a permanent home.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the Rust API docs entry from external_links to icon_links and use
the fa-brands fa-rust gear mark. Now sits next to the GitHub icon in
navbar_end with matching visual weight instead of a wider text link.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The default pydata-sphinx-theme sidebar-nav-bs starts at the current
top-level section, so the root index — which has no parent section —
ends up with an empty sidebar. The theme's layout also explicitly
filters sidebar-nav-bs out of the sidebar list when suppress_sidebar_
toctree() returns true (which it does for root pages), so simply
overriding sidebar-nav-bs.html in templates doesn't help.

Add a sidebar-globaltoc.html template that calls Sphinx's toctree()
global directly to render the full document tree, and wire it through
html_sidebars under a name the theme's suppress filter doesn't strip.
Landing page now shows User Guide / Contributor Guide / API Reference
in the sidebar with the current section expanded on inner pages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch the sidebar toctree call from toctree() to generate_toctree_html
with collapse=False, so nested <ul>s render into the DOM for every
branch. The pydata-sphinx-theme JS then wraps them in <details> with
fa-chevron-down toggles, matching the datafusion-comet sidebar where
each section with children can be expanded inline. show_nav_level=1
keeps deeper levels collapsed on first load.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bump show_nav_level 1 -> 2 so the landing-page sidebar opens with
User Guide / Contributor Guide / API Reference already expanded to
their immediate children. Deeper levels remain collapsed behind
chevrons so the sidebar stays scannable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restore the "Links" sidebar heading that the previous site had —
GitHub and Issue Tracker, Rust API Docs, Code of Conduct, Examples.
Implemented as a second hidden toctree with :caption: Links so the
pydata-sphinx-theme sidebar renders the heading above the four
external URLs. Drop Code of Conduct from the Contributor Guide
toctree since it now lives under Links instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the second hidden toctree (which expanded each external URL
into its own navbar entry) with a dedicated links.rst landing page,
and add a single "links" entry to the main toctree. Top navbar now
shows User Guide / Contributor Guide / API Reference / Links — four
items, no wrapping. Clicking Links opens the page that lists GitHub,
Rust API Docs, Code of Conduct, and Examples.

Drop the external_links Examples entry from conf.py since the same
URL now lives on the Links page.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop in the same favicon.svg the main datafusion.apache.org site
uses (just the Apache DataFusion mark, no wordmark) and wire it
through html_favicon. Browsers and bookmarks now show the project
icon instead of the generic Sphinx page glyph.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two small follow-ups from the Copilot reviewer on apache#1578:

- Append .html to the html_sidebars entry. Sphinx's Jinja loader
  resolves both "sidebar-globaltoc" and "sidebar-globaltoc.html" to
  the same template, but the explicit form is closer to the spelling
  in the Sphinx docs and is harder to misread.
- Update the inline comment in sidebar-globaltoc.html that still
  claimed show_nav_level=1 after we bumped it to 2 in conf.py. Now
  describes the variable wiring instead of hard-coding a number that
  has to be kept in sync with conf.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of the documentation-site refresh. Run `rst2myst convert` over
every human-authored .rst file under docs/source/ and remove the
originals. The result:

- 33 .rst files become 33 .md files (user guide, contributor guide,
  index, links).
- Headings, paragraphs, hyperlinks, code blocks, admonitions, and
  toctree directives all map cleanly to MyST syntax.
- Cross-reference anchors round-trip through MyST as `(label)=`
  blocks. The converter kebab-cased the labels (e.g. `(io-csv)=`),
  but every `{ref}` target in the corpus still uses the underscore
  form from the original RST (`{ref}\`CSV <io_csv>\``) and so do the
  Python docstrings that AutoAPI pulls in. Rewrite the anchors back
  to the underscore form so the existing references resolve.
- 86 `{eval-rst}` blocks remain — they all wrap `.. ipython::`
  directives, which have no first-class MyST equivalent. They render
  identically and don't block the build.

conf.py changes:

- Enable `colon_fence` and `deflist` MyST extensions (rst-to-myst
  emits these on a few files, particularly execution-metrics.md).
- Keep `.rst` in `source_suffix` even though no human-authored RST
  remains: sphinx-autoapi generates RST under autoapi/ at build time
  and Sphinx needs the suffix registered to parse it.

AGENTS.md: update the two .rst paths called out under "Aggregate and
Window Function Documentation" to point at the .md equivalents.

Verified by building locally — `build succeeded`, no warnings, all
internal cross-references resolve, the ipython examples on the
landing page and basics page still execute.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant