Skip to content

Refactor/cleanup python internals#263

Open
RalfG wants to merge 3 commits intorelease/4.2from
refactor/cleanup-python-internals
Open

Refactor/cleanup python internals#263
RalfG wants to merge 3 commits intorelease/4.2from
refactor/cleanup-python-internals

Conversation

@RalfG
Copy link
Copy Markdown
Member

@RalfG RalfG commented Apr 15, 2026

Summary

This PR simplifies the Python/Rust boundary for annotation and target extraction, and keeps the correlate flow working with both raw and pre-annotated preloaded spectra. Final performance improved substantially after the Rust-side follow-up.

Changelog

Added

  • Support for using Rust ms2pip_extract_targets(...) in the correlate pipeline.
  • Support for carrying AnnotatedMS2Spectrum objects directly through the preloaded-spectrum path.

Changed

  • Updated MS²PIP to consume ms2pip_compute_theoretical_mz(...) results directly from Rust without wrapping them again in np.array(...).
  • Updated annotation calls so annotate_ms2_spectra(...) no longer receives seq_lens.
  • Updated annotate_spectrum(...) to return AnnotatedMS2Spectrum instead of Python tuple-converted annotations.
  • Updated MatchedSpectrum to store annotated_spectrum instead of peak_annotations.
  • Updated _validate_and_extract_targets(...) to batch target extraction through a single Rust call.
  • Updated correlate_single(...) to use Rust target extraction as well.
  • Updated tests/test_spectrum_processing.py to validate Rust target extraction directly using AnnotatedMS2Spectrum / FragmentAnnotation.
  • Bumped ms2rescore-rs dependency in pyproject.toml to >=0.5.0a3,<2.

Removed

  • Python-side _annotations_to_tuples(...) from the active annotation/target-extraction flow.
  • Python-side targets_from_annotations(...) from the active correlate flow.
  • Redundant Python-side conversion of annotation results back into tuple structures when pre-annotated spectra are already available.

Fixed

  • Restored / improved correlate performance after the initial API migration by following up on Rust-side bottlenecks (down from ~110 sec to ~75 sec for ~78k spectra).

RalfG added 3 commits April 15, 2026 15:13
Split the flag-driven _correlate_internal into three focused functions
(_predict_with_observed, _extract_observations, _extract_training_data)
with shared validation logic, improving readability and traceability of
each code path.

- Remove pass-through __init__ methods from Pydantic models, moving
  param docs to class docstrings (Spectrum, ProcessingResult,
  ProteomeSearchSpace, ModificationConfig, _PeptidoformSearchSpace)
- Extract _annotations_to_tuples helper for repeated FragmentAnnotation
  destructuring in _spectrum_processing.py
- Add Spectrum.inverse_log2_transform() as the canonical inverse of
  log2_transform(), replacing duplicated inline expressions
- Remove stray debugpy import in xgb_models.py
- CLI: Make logging level case insensitive
Update the Python spectrum-processing flow to consume AnnotatedMS2Spectrum and Rust target extraction directly, refresh the related tests.
@RalfG RalfG added enhancement dependencies Pull requests that update a dependency file labels Apr 15, 2026
@RalfG RalfG added this to the v4.2.0 milestone Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file enhancement

Development

Successfully merging this pull request may close these issues.

1 participant