Skip to content

STAR-compatible optional tags on unmapped records#80

Merged
Psy-Fer merged 2 commits into
mainfrom
fix/unmapped-record-tags
Jun 5, 2026
Merged

STAR-compatible optional tags on unmapped records#80
Psy-Fer merged 2 commits into
mainfrom
fix/unmapped-record-tags

Conversation

@Psy-Fer
Copy link
Copy Markdown
Collaborator

@Psy-Fer Psy-Fer commented Jun 1, 2026

fix: STAR-compatible optional tags on unmapped records + mismatch reason classification

Summary

Two related correctness fixes to unmapped-read handling that have no effect on alignment
decisions:

  • Missing optional tags on unmapped records. STAR emits NH:i:0, HI:i:0, AS:i:0,
    nM:i:0, and uT:A: on every unmapped record. rustar-aligner emitted none of them.
    uT:A: in particular is parsed by MultiQC and similar QC tools to break down unmapping
    categories; without it those tools fall back to flag-only counting and lose the
    fine-grained breakdown.

  • Mismatch-filtered reads misclassified as TooShort. When all transcript candidates
    were removed solely by the mismatch count/rate filter
    (--outFilterMismatchNmax / --outFilterMismatchNoverLmax), Log.final.out recorded
    them under "too short" instead of "too many mismatches", inflating the former and keeping
    the latter at zero. STAR distinguishes these two cases in
    ReadAlign_mappedFilter.cpp:20–30.

Closes the residual part of #48.

Changes

src/align/read_align.rs

Replaced the catch-all Some(UnmappedReason::TooShort) at the end of the quality-filter
block with logic that inspects the existing filter_reasons map: if only mismatch filters
fired (mismatch_max / mismatch_rate) the reason is TooManyMismatches; otherwise
TooShort.

src/io/sam.rs

  • New free function insert_unmapped_tags(record, attrs, reason) inserts NH:i:0,
    HI:i:0, AS:i:0, nM:i:0 (gated on outSAMattributes as for mapped records) and
    uT:A: (always emitted, matching STAR's unconditional behaviour). uT:A: values:
    0=other, 1=too short, 2=too many mismatches, 3=too many loci.
  • build_unmapped_record: rg_id: Option<&str> replaced by params: &Parameters +
    unmapped_reason: UnmappedReason; RG tag derived internally as other builders do.
  • build_paired_unmapped_records: gains unmapped_reason: UnmappedReason parameter.
  • build_half_mapped_records: unmapped-mate section calls insert_unmapped_tags with
    UnmappedReason::Other.

src/lib.rs

All four call sites updated. The now-unused rg_id_owned binding at the top of
run_single_pass removed.

src/io/bam.rs

Three test call sites updated to the new build_unmapped_record signature.

Test plan

  • cargo test — 436 tests passing, 0 failures
  • cargo clippy --all-targets — 0 warnings
  • cargo fmt --check — clean
  • New test test_unmapped_record_tags_emitted — verifies NH/HI/AS present and all
    four uT:A: values (03) on build_unmapped_record output
  • New test test_unmapped_reason_mismatch_classification — verifies the
    filter_reasonsUnmappedReason mapping for all cases (mismatch-only, score-only,
    mixed, empty)
  • SE benchmark: 8604/8926 (96.4%) — unchanged from post-batch-1 baseline (9-read
    drift is from batch 1's Gsj/sjdb changes, not this branch)
  • PE benchmark: 8390/8390 both-mapped — exact match with STAR, unchanged

@Psy-Fer Psy-Fer merged commit b676210 into main Jun 5, 2026
10 checks passed
@Psy-Fer Psy-Fer deleted the fix/unmapped-record-tags branch June 5, 2026 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant