Skip to content

refactor: PR7 — naming pass + type:ignore audit#192

Merged
d-laub merged 6 commits into
mainfrom
refactor/pr7-naming-typeignore
May 24, 2026
Merged

refactor: PR7 — naming pass + type:ignore audit#192
d-laub merged 6 commits into
mainfrom
refactor/pr7-naming-typeignore

Conversation

@d-laub
Copy link
Copy Markdown
Collaborator

@d-laub d-laub commented May 24, 2026

Summary

Final refactor-campaign PR. Two passes folded into one branch.

Naming pass:

  • rsp_idxregion_sample_ploid_idx (single local in Haps._get_geno_offset_idx; was an unexplained 3-tuple of region/sample/ploid)
  • SplicePlan.permSplicePlan.permutation (clearer at call sites; loop vars never used perm)
  • geno_offset_idxsgeno_offset_idx (kernel param names now match the ReconstructionRequest field; the prior mismatch was the real bug)
  • reconstruct_haplotype_from_sparse (singular kernel) and reconstruct_haplotypes_from_sparse (batched driver) kept distinct, with cross-referencing docstrings disambiguating the pair

`# type: ignore` audit:

  • 68 → 29 ignores (39 stale, removed)
  • All 29 remaining ignores now carry a pyrefly rule code plus a one-line specific reason identifying the upstream stub issue (hirola `HashTable.max`, `Ragged.shape` containing `None`, `ak.str` missing from stubs, `dataclasses.replace` not preserving `Self`, polars overload mismatches, etc.)

No public API changes. No on-disk format changes.

Test plan

  • `pixi run -e dev test` (488 pytest + 6 skipped + 2 xfailed; 4/4 cargo)
  • `pixi run -e dev ruff check python/` clean
  • `pixi run -e dev typecheck` (pyrefly) — 0 errors, baseline preserved

🤖 Generated with Claude Code

d-laub added 6 commits May 23, 2026 23:14
Local variable in Haps._get_geno_offset_idx is the (region, sample, ploid)
index tuple passed to np.ravel_multi_index. The 'rsp' acronym was opaque to
new readers.
Kernel parameters used the plural form while field/local names used the
singular. Field name 'geno_offset_idx' (the single conceptual index array)
wins — kernels in _genotypes.py and _tracks.py now use the same name as
ReconstructionRequest.geno_offset_idx and the local at the call sites in
_haps.py. Pure rename; semantics unchanged.
Field name now spells out what it is (a permutation array). Local
variables in callers also renamed (perm -> permutation) where they
refer to the field. Loop variables and 'permuted_*' result names
were already distinct and remain as-is. Pure rename; SplicePlan is
internal (not in __all__).
…nels

Both names stay (reconstruct_haplotype_from_sparse is the per-(query, hap)
inner kernel; reconstruct_haplotypes_from_sparse is the batched parallel
driver that dispatches to it). Add a one-line docstring note on each so
the relationship is explicit without forcing readers to read both bodies.
Audit pass for PR7 task 2. Stripped all 68 ignores, ran pyrefly, then
restored only the ~29 that suppress real warnings/errors. Each remaining
ignore now has a narrow rule code and a one-line reason.

Removed (stale, no longer needed):
- _ragged.py: ufunc_comp_dna numba ufunc call
- _dataset/_haps.py: ak.to_packed / ak.to_regular sites; pylance-only note
- _dataset/_impl.py: self._seqs.genotypes.shape index
- _dataset/_indexing.py: ak.flatten().to_numpy() narrowing
- _dataset/_query.py: ak.where + reverse_complement; recon return widen
- _dataset/_rag_variants.py: alleles.content layout walk, reverse-complement
  field assignment, NDArray casts that pyrefly already narrows
- _dataset/_reference.py: ref.reshape / to_padded / squeeze on Ragged; torch
  import guards
- _dataset/_tracks.py / _dataset/_write.py: misc ndarray construction sites
- _torch.py / data_registry.py: dead/unreachable branches
- _variants/_sitesonly.py: raise ValueError unreachable annotation

Annotated (kept with rule code + reason):
- HashTable max=int across _indexing/_reference/_splice
  -> hirola stubs require numpy.Number but int works at runtime
- np.unravel_index / np.ravel_multi_index on Ragged.shape (_haps)
  -> Ragged.shape is tuple[int|None,...]; numpy overload expects all-int
- np.ones with ak.Array shape (_rag_variants)
  -> same shape-with-None issue
- ak.str.length attribute lookup
  -> ak.str submodule absent from top-level awkward stubs
- RaggedIntervals / RaggedAlleles constructor calls (_ragged, _dummy)
  -> seqpro Ragged stubs widen __getitem__/squeeze/from_offsets returns
- replace(self, ...) and super().__getitem__(idx) returns (_impl)
  -> typevar narrowing not preserved across base-class return
- to_kind(_kind) (_impl)
  -> _kind union widened by control-flow merge
- DataFrame[regions] / DataFrame.filter(regions) (_reference)
  -> polars stubs reject some union members our runtime accepts
- recon = tuple(o.reshape/squeeze ...) (_query)
  -> heterogeneous dispatch across array kinds
- cast() on offsets after layout walk (_rag_variants)
  -> documents narrowing pyrefly already infers
@d-laub d-laub merged commit 70f27c6 into main May 24, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant