Skip to content

Add dependency-free diarization error rate (DER) scoring#187

Merged
alexkroman merged 2 commits into
mainfrom
claude/sweet-knuth-ozbgwo
Jun 16, 2026
Merged

Add dependency-free diarization error rate (DER) scoring#187
alexkroman merged 2 commits into
mainfrom
claude/sweet-knuth-ozbgwo

Conversation

@alexkroman

Copy link
Copy Markdown
Collaborator

Adds a new aai_cli.core.der module that implements NIST diarization error rate (DER) scoring in pure Python, with no external dependencies beyond the standard library.

Summary

DER is the standard metric for evaluating speaker diarization — it measures the fraction of reference speech time that is misattributed to the wrong speaker. This implementation complements the existing WER (word error rate) scoring in aai_cli.core.wer by providing speaker attribution accuracy.

Key Changes

  • New module aai_cli/core/der.py: Implements DER scoring with:

    • Segment dataclass: represents a span of speech attributed to a speaker
    • Score dataclass: holds the three NIST error components (missed, false_alarm, confusion) plus reference speech time, with computed properties for total errors and DER rate
    • score() function: computes DER by partitioning the timeline at segment boundaries, tallying errors per atomic interval, and finding the optimal one-to-one speaker mapping via exhaustive search (feasible because diarization files have few speakers)
    • pooled() function: aggregates scores across multiple files for corpus-level DER
    • Helper functions for timeline analysis, speaker mapping, and cooccurrence tracking
  • Comprehensive test suite tests/test_der.py: 14 tests covering:

    • Immutability of value types
    • Perfect diarization (zero error)
    • Speaker label remapping (labels are arbitrary, optimal mapping recovers correct attribution)
    • Missed speech, false alarms, and speaker confusion scenarios
    • Overlapping speakers (counted per speaker, not wall-clock time)
    • Optimal vs. greedy speaker assignment
    • Corpus-level pooling

Implementation Details

  • No external dependencies: Uses only Python stdlib (itertools, dataclasses, collections.abc)
  • Optimal speaker mapping: Exhaustive permutation search over speaker pairs (factorial complexity is acceptable for typical diarization speaker counts of 2–10)
  • Timeline partitioning: Segments are split at every boundary so each atomic interval has a fixed set of active speakers, enabling accurate per-interval error tallying
  • Weighted by speaker-time: Reference speech time is counted per concurrent speaker, so overlapping speech is properly attributed
  • Immutable value types: Both Segment and Score are frozen dataclasses for safety

https://claude.ai/code/session_011qBbDEWrtpPVQVaVKAHnMx

claude added 2 commits June 16, 2026 19:26
Add aai_cli/core/der.py, a pure-Python diarization error rate scorer
mirroring core/wer.py's shape (frozen Score, score, pooled). It computes
the standard NIST/pyannote DER — missed / false-alarm / speaker-confusion
time over reference speech time — by partitioning the shared timeline at
every segment boundary and optimally mapping speaker labels via exact
permutation search (diarization speaker counts are small).

No new dependency: pyannote.metrics pulls numpy/scipy/pandas, and the
lighter PyPI options still pull numpy or compile a C++ extension, whereas
the current eval stack (jiwer) has neither numpy nor scipy.

Not yet wired into `assembly eval` — DER needs reference speaker timing
(RTTM-style segments) that the current text-only dataset path doesn't
carry; that integration is a separate change.
Kill two mutation-gate survivors on core/der.py: assert Segment is frozen,
and cover the case where reference and hypothesis each carry an unmatched
speaker (so the optimal mapping must weigh a non-co-occurring pair). These
tests were validated by the full gate but missed the prior commit's stale
staging snapshot.
@alexkroman alexkroman enabled auto-merge June 16, 2026 19:40
@alexkroman alexkroman added this pull request to the merge queue Jun 16, 2026
Merged via the queue into main with commit f14616d Jun 16, 2026
19 checks passed
@alexkroman alexkroman deleted the claude/sweet-knuth-ozbgwo branch June 16, 2026 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants