Skip to content

refactor(open): extract OpenRequest + decompose Dataset.open (PR4)#187

Merged
d-laub merged 2 commits into
mainfrom
refactor/pr4-open-request
May 24, 2026
Merged

refactor(open): extract OpenRequest + decompose Dataset.open (PR4)#187
d-laub merged 2 commits into
mainfrom
refactor/pr4-open-request

Conversation

@d-laub
Copy link
Copy Markdown
Collaborator

@d-laub d-laub commented May 24, 2026

Summary

  • Dataset.open was a 155-line method mixing arg unpacking, path validation, metadata loading, indexer build, source assembly (Haps/Ref + Tracks), reference-bounds check, dataset assembly, and post-construction settings application.
  • New module python/genvarloader/_dataset/_open.py defines an OpenRequest frozen dataclass holding the parsed args plus a .resolve() method that orchestrates the stages, each as its own small method:
    • _validate_path
    • _load_metadata
    • _build_indexer
    • _resolve_reference
    • _build_seqs
    • _build_tracks
    • _initial_seqs_kind
    • _check_reference_bounds
    • _assemble_dataset
    • _apply_post_settings
  • Dataset.open is now a thin facade (~20 lines) that constructs an OpenRequest and calls .resolve(). Public API unchanged; both @overload declarations on Dataset.open are preserved.
  • _impl.py: 2156 → 2015 (−141 lines). New _open.py: 262 lines. Net: stages are now small enough to unit-test individually if/when desired.

Test plan

  • pixi run -e dev test (488 pytest + cargo) — all pass
  • pixi run -e dev ruff check python/ — clean
  • pixi run -e dev pyrefly check — 0 errors (baseline unchanged)

Part of the refactor campaign tracked in docs/superpowers/specs/2026-05-23-refactor-campaign-design.md.

🤖 Generated with Claude Code

d-laub and others added 2 commits May 23, 2026 20:15
Dataset.open was a 155-line method mixing argument unpacking, path
validation, metadata loading, indexer construction, source assembly
(seqs/tracks), reference-bounds sanity-check, dataset assembly, and
post-construction settings.

Pull all of that into a new `_dataset/_open.py` module housing an
`OpenRequest` dataclass. `OpenRequest.resolve()` orchestrates the
stages, each a small named method (`_load_metadata`, `_build_indexer`,
`_resolve_reference`, `_build_seqs`, `_build_tracks`,
`_initial_seqs_kind`, `_check_reference_bounds`, `_assemble_dataset`,
`_apply_post_settings`). Each stage reads in one sitting and can be
exercised independently when we want stage-level unit tests later.

`Dataset.open` is now a thin facade (~20 lines including docstring
delimiters) that packs its arguments into an OpenRequest and calls
`.resolve()`. Public API surface unchanged.

`_impl.py`: 2156 -> 2015 (-141 lines).

Behavior preserving; full test suite (488 pytest + cargo) green;
pyrefly + ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@d-laub d-laub merged commit 4849a1e into main May 24, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant