refactor: PR7 — naming pass + type:ignore audit by d-laub · Pull Request #192 · mcvickerlab/GenVarLoader

d-laub · 2026-05-24T06:45:31Z

Summary

Final refactor-campaign PR. Two passes folded into one branch.

Naming pass:

rsp_idx → region_sample_ploid_idx (single local in Haps._get_geno_offset_idx; was an unexplained 3-tuple of region/sample/ploid)
SplicePlan.perm → SplicePlan.permutation (clearer at call sites; loop vars never used perm)
geno_offset_idxs → geno_offset_idx (kernel param names now match the ReconstructionRequest field; the prior mismatch was the real bug)
reconstruct_haplotype_from_sparse (singular kernel) and reconstruct_haplotypes_from_sparse (batched driver) kept distinct, with cross-referencing docstrings disambiguating the pair

`# type: ignore` audit:

68 → 29 ignores (39 stale, removed)
All 29 remaining ignores now carry a pyrefly rule code plus a one-line specific reason identifying the upstream stub issue (hirola `HashTable.max`, `Ragged.shape` containing `None`, `ak.str` missing from stubs, `dataclasses.replace` not preserving `Self`, polars overload mismatches, etc.)

No public API changes. No on-disk format changes.

Test plan

`pixi run -e dev test` (488 pytest + 6 skipped + 2 xfailed; 4/4 cargo)
`pixi run -e dev ruff check python/` clean
`pixi run -e dev typecheck` (pyrefly) — 0 errors, baseline preserved

🤖 Generated with Claude Code

Local variable in Haps._get_geno_offset_idx is the (region, sample, ploid) index tuple passed to np.ravel_multi_index. The 'rsp' acronym was opaque to new readers.

Kernel parameters used the plural form while field/local names used the singular. Field name 'geno_offset_idx' (the single conceptual index array) wins — kernels in _genotypes.py and _tracks.py now use the same name as ReconstructionRequest.geno_offset_idx and the local at the call sites in _haps.py. Pure rename; semantics unchanged.

Field name now spells out what it is (a permutation array). Local variables in callers also renamed (perm -> permutation) where they refer to the field. Loop variables and 'permuted_*' result names were already distinct and remain as-is. Pure rename; SplicePlan is internal (not in __all__).

…nels Both names stay (reconstruct_haplotype_from_sparse is the per-(query, hap) inner kernel; reconstruct_haplotypes_from_sparse is the batched parallel driver that dispatches to it). Add a one-line docstring note on each so the relationship is explicit without forcing readers to read both bodies.

Audit pass for PR7 task 2. Stripped all 68 ignores, ran pyrefly, then restored only the ~29 that suppress real warnings/errors. Each remaining ignore now has a narrow rule code and a one-line reason. Removed (stale, no longer needed): - _ragged.py: ufunc_comp_dna numba ufunc call - _dataset/_haps.py: ak.to_packed / ak.to_regular sites; pylance-only note - _dataset/_impl.py: self._seqs.genotypes.shape index - _dataset/_indexing.py: ak.flatten().to_numpy() narrowing - _dataset/_query.py: ak.where + reverse_complement; recon return widen - _dataset/_rag_variants.py: alleles.content layout walk, reverse-complement field assignment, NDArray casts that pyrefly already narrows - _dataset/_reference.py: ref.reshape / to_padded / squeeze on Ragged; torch import guards - _dataset/_tracks.py / _dataset/_write.py: misc ndarray construction sites - _torch.py / data_registry.py: dead/unreachable branches - _variants/_sitesonly.py: raise ValueError unreachable annotation Annotated (kept with rule code + reason): - HashTable max=int across _indexing/_reference/_splice -> hirola stubs require numpy.Number but int works at runtime - np.unravel_index / np.ravel_multi_index on Ragged.shape (_haps) -> Ragged.shape is tuple[int|None,...]; numpy overload expects all-int - np.ones with ak.Array shape (_rag_variants) -> same shape-with-None issue - ak.str.length attribute lookup -> ak.str submodule absent from top-level awkward stubs - RaggedIntervals / RaggedAlleles constructor calls (_ragged, _dummy) -> seqpro Ragged stubs widen __getitem__/squeeze/from_offsets returns - replace(self, ...) and super().__getitem__(idx) returns (_impl) -> typevar narrowing not preserved across base-class return - to_kind(_kind) (_impl) -> _kind union widened by control-flow merge - DataFrame[regions] / DataFrame.filter(regions) (_reference) -> polars stubs reject some union members our runtime accepts - recon = tuple(o.reshape/squeeze ...) (_query) -> heterogeneous dispatch across array kinds - cast() on offsets after layout walk (_rag_variants) -> documents narrowing pyrefly already infers

d-laub added 6 commits May 23, 2026 23:14

refactor(naming): rename rsp_idx -> region_sample_ploid_idx

afc8dc1

Local variable in Haps._get_geno_offset_idx is the (region, sample, ploid) index tuple passed to np.ravel_multi_index. The 'rsp' acronym was opaque to new readers.

style: apply ruff-format

03c92e6

d-laub merged commit 70f27c6 into main May 24, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: PR7 — naming pass + type:ignore audit#192

refactor: PR7 — naming pass + type:ignore audit#192
d-laub merged 6 commits into
mainfrom
refactor/pr7-naming-typeignore

d-laub commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

d-laub commented May 24, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant