Skip to content

[MRG] Fixes #441: expose PAA segment_indices()#671

Open
jbbqqf wants to merge 1 commit into
tslearn-team:mainfrom
jbbqqf:feat/441-paa-segment-indices
Open

[MRG] Fixes #441: expose PAA segment_indices()#671
jbbqqf wants to merge 1 commit into
tslearn-team:mainfrom
jbbqqf:feat/441-paa-segment-indices

Conversation

@jbbqqf
Copy link
Copy Markdown

@jbbqqf jbbqqf commented May 9, 2026

Summary

Adds PiecewiseAggregateApproximation.segment_indices(), a small accessor that returns the (start, end) indices each PAA segment summarises in the original-length time series. This lets users locate "where the PAA value changes" — the question raised in #441 — without re-deriving the segment-width logic from _transform.

Fixes #441Index of changes in PAA and SAX.

Context

Issue #441 asks for a way to map a paa_data[i] value back to the original-series index range it averages. The information already exists implicitly inside _transform (sz_segment = sz_fit // n_segments; segment i covers ts[i*sz_segment : (i+1)*sz_segment]), but callers had to re-derive it. Exposing it as a method:

  • keeps the source-of-truth in one place (we reuse the same sz_fit // n_segments formula),
  • makes the paa_data[i] == ts[start_i:end_i].mean() invariant testable,
  • requires no change to transform / inverse_transform.

Changes

  • tslearn/piecewise/piecewise.py — new segment_indices() method on PiecewiseAggregateApproximation. Returns np.ndarray of shape (n_segments, 2) with half-open intervals [start, end) matching _transform. Raises NotFittedError before fit. A short inline comment notes that the segment-width formula must stay in sync with _transform so that paa_data[i_seg] == ts[start_i:end_i].mean() holds.
  • tests/test_piecewise.py — regression test test_paa_segment_indices that:
    • asserts NotFittedError before fitting,
    • checks shape / dtype,
    • verifies the means in paa_data match ts[start:end].mean() cell-by-cell,
    • exercises the non-divisible-length branch (sz=7, n_segments=3 → trailing sample dropped).
  • CHANGELOG.md — entry under [Towards v0.9.0] / Added.

The new method is purely additive — no existing PAA / SAX behaviour changes.

Reproduce BEFORE/AFTER yourself (copy-paste)

# --- one-time setup ---
git clone https://github.com/tslearn-team/tslearn.git /tmp/repro && cd /tmp/repro
python -m venv .venv && source .venv/bin/activate
pip install -e . pytest >/dev/null

# Save the regression test that demonstrates the fix
git fetch https://github.com/jbbqqf/tslearn.git feat/441-paa-segment-indices
git checkout FETCH_HEAD -- tests/test_piecewise.py

# --- BEFORE (origin/main) ---
git checkout origin/main -- tslearn/piecewise/piecewise.py
pytest tests/test_piecewise.py::test_paa_segment_indices -q
# Expected: FAILED — AttributeError: 'PiecewiseAggregateApproximation' object has no attribute 'segment_indices'

# --- AFTER (this PR) ---
git checkout FETCH_HEAD -- tslearn/piecewise/piecewise.py
pytest tests/test_piecewise.py::test_paa_segment_indices -q
# Expected: 1 passed

What I ran locally

  • pytest tests/test_piecewise.py -v → 5 passed (the existing 4 PAA/SAX tests + the new one).
  • pytest --doctest-modules tslearn/piecewise/piecewise.py → 7 passed (covers the new docstring example).

Edge cases tested

# Scenario Input Expected Verified by
1 Unfitted estimator PiecewiseAggregateApproximation(n_segments=3).segment_indices() NotFittedError test_paa_segment_indices
2 Divisible length sz=6, n_segments=3 on [-1, 2, 0.1, -1, 1, -1] [[0,2],[2,4],[4,6]] and paa_data[i] == ts[start:end].mean() test_paa_segment_indices (loop assertion)
3 Non-divisible length sz=7, n_segments=3 [[0,2],[2,4],[4,6]] (trailing sample dropped, matching _transform) test_paa_segment_indices (non-divisible block)

Risk / blast radius

Additive only: a new method on an existing class. No changes to fit, transform, inverse_transform, distance, or distance_paa. Cannot break existing callers; cannot affect serialisation (no new attribute is stored on the estimator).

Release note

piecewise: PiecewiseAggregateApproximation now exposes segment_indices() to map PAA segments back to original-series index ranges (#441).

PR drafted with assistance from Claude Code. The change was reviewed manually against tslearn-team/tslearn's source (tslearn/piecewise/piecewise.py:147 _transform) and the reproducer block above was used during development; it is the same one a reviewer can paste verbatim.

Expose the start/end indices of each PAA segment so callers can map a
PAA value back to the original-time-series range it summarises. The
indices match the slicing convention already used in
``PiecewiseAggregateApproximation._transform`` (segment width
``sz_fit // n_segments`` with trailing samples dropped), so
``paa_data[i_seg]`` equals ``ts[start_i:end_i].mean()`` by construction.

A regression test in ``tests/test_piecewise.py`` checks that contract on
both a divisible and a non-divisible series length.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.72%. Comparing base (2f2029a) to head (72c554b).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #671      +/-   ##
==========================================
+ Coverage   93.70%   93.72%   +0.01%     
==========================================
  Files          73       73              
  Lines        6986     7008      +22     
==========================================
+ Hits         6546     6568      +22     
  Misses        440      440              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@charavelg
Copy link
Copy Markdown
Contributor

I believe #441 was about retrieving the indices induced by a particular dataset rather than the fitted one. In the case I'm wrong, and indices for the fitted dataset are required, an attribute with sklearn syntax (something like indices_) is probably a better fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Index of changes in PAA and SAX

2 participants