FIX: surface unscoreable sub-scores that mask composite true/false verdicts by AUTHENSOR · Pull Request #2042 · microsoft/PyRIT

AUTHENSOR · 2026-06-18T07:29:39Z

Description

A TrueFalseScorer that cannot evaluate a response returns a fallback
Score(score_value="false") via TrueFalseScorer._build_fallback_score
(pyrit/score/true_false/true_false_scorer.py, the "No supported pieces to score
after filtering" branch). This happens whenever no message piece survives the
scorer's validator filtering, e.g. an image-only scorer handed a text-only
response. The base Scorer.score_async invokes this fallback at
pyrit/score/scorer.py:267-268.

The problem: this "could-not-score" false is indistinguishable from a genuine
"not harmful" false. In a TrueFalseCompositeScorer
(pyrit/score/true_false/true_false_composite_scorer.py:104-141) using
TrueFalseScoreAggregator.AND (pyrit/score/true_false/true_false_score_aggregator.py,
functools.reduce(operator.and_, bool_values)), a sub-scorer that merely could not
evaluate the response contributes a hard false that silently vetoes another
sub-scorer's confirmed harmful true. The aggregate reports false = "attack did
not succeed". For a red-team this is a false-assurance hazard: a confirmed success
is under-reported, with no signal that a sub-scorer abstained rather than judging
the response not harmful.

This change makes the masking visible and the could-not-score state
distinguishable, without changing any aggregate verdict value (non-breaking):

TrueFalseScorer._build_fallback_score now sets
score_metadata={"unscoreable": 1} on the filtered fallback false only.
Blocked and error fallbacks are intentionally left unflagged — those are real
observations of the target's behavior, not an inability to score. The flag key
(UNSCOREABLE_METADATA_KEY) lives in true_false_score_aggregator.py (the
lower-level module) to avoid a circular import and give the base scorer and the
aggregators a single source of truth.
The true/false aggregators (AND/OR/MAJORITY, the single chokepoint for both
the composite scorer and multi-piece TrueFalseScorer aggregation) now detect
unscoreable sub-scores. When any are present they emit a logger.warning and
append a note to the aggregated rationale. When an abstention dragged an
otherwise-true signal down to a false aggregate (the classic AND-masking case),
the warning and note call that out explicitly as a possible under-reported success.
The unscoreable flag also propagates into the aggregate's score_metadata via the
existing combine_metadata_and_categories path, so the distinction survives upward.

The aggregate verdict value is never changed. Whether the default verdict for an
unscoreable input should differ (e.g. abstain / skip rather than contribute false)
is a separate, behavior-changing discussion tracked in a companion issue.

Tests and Documentation

New deterministic regression tests (no LLM, no network):

tests/unit/score/test_true_false_score_aggregator.py
- test_and_unscoreable_masks_true_warns_and_notes — AND over a confirmed true
  plus an unscoreable false: asserts the verdict value is still False
  (non-breaking guard), a warning naming the masking hazard is logged, and the
  rationale records the abstention and the possible under-reporting.
- test_and_unscoreable_present_without_true_warns_but_no_masking_note — an
  unscoreable false with no competing true is noted but not flagged as masking.
- test_genuine_all_false_is_not_flagged — a genuine all-false aggregate emits no
  warning and no note.
tests/unit/score/test_true_false_composite_scorer.py
- test_unscoreable_fallback_is_marked_distinguishable — a real image-only scorer
  on a text response produces the fallback false carrying {"unscoreable": 1}.
- test_composite_and_unscoreable_masking_is_visible_but_verdict_unchanged — a real
  text harmful-true scorer composed under AND with the filtered image-only scorer:
  asserts (a) the unscoreable flag propagates into the aggregate metadata, (b) a
  warning is logged and the rationale notes the abstention, and (c) the AND verdict
  value is unchanged (False).
- test_composite_and_genuine_all_false_is_not_flagged — genuine all-false
  composite is not flagged.

How checks were run (from a fresh clone):

ruff check and ruff format --check on the two changed source files and the two
changed test files — all clean.
python -m pytest tests/unit/score/test_true_false_composite_scorer.py tests/unit/score/test_true_false_score_aggregator.py -q — 31 passed.
Full python -m pytest tests/unit/score/ -q — 1197 passed, 16 skipped (no
regressions).
ty check on the two changed source files — all checks passed.

(Test payload text is summarized as "harmful payload"; no actual harmful content is
included.)

JupyText / docs notebooks: N/A — this change is internal scorer observability with no
public API surface change and no doc/notebook updates.

…icts A TrueFalseScorer that cannot evaluate a response (no piece survives validator filtering) returns a fallback Score(false) that is indistinguishable from a genuine 'not harmful' false. Under a TrueFalseCompositeScorer with the AND aggregator, such a 'could not score' false silently vetoes another sub-scorer's confirmed harmful true, so the aggregate reports 'attack did not succeed' with no signal that a sub-scorer abstained. For a red-team this under-reports a real success. Non-breaking observability fix (verdict values unchanged): - Mark the filtered fallback Score with score_metadata {unscoreable: 1} in TrueFalseScorer._build_fallback_score so a 'could not score' false is distinguishable from a real 'not harmful' false. - In the true/false aggregators, when one or more unscoreable sub-scores are present, emit a logger.warning and append a note to the aggregated rationale (calling out the masking case where an abstention dragged an otherwise-true signal to false under AND). The aggregate verdict value itself is unchanged. Adds deterministic regression tests (no LLM) covering the metadata flag, the warning/rationale note, the unchanged verdict value, and that a genuine all-false case is not flagged.

This was referenced Jun 18, 2026

Composite AND scorer silently masks a confirmed true with a could-not-score false #2043

Open

Scorers conflate couldn't-score / errored / blocked / hedged with attack-did-not-succeed, under-reporting jailbreaks #2044

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: surface unscoreable sub-scores that mask composite true/false verdicts#2042

FIX: surface unscoreable sub-scores that mask composite true/false verdicts#2042
AUTHENSOR wants to merge 1 commit into
microsoft:mainfrom
AUTHENSOR:fix/composite-scorer-unscoreable-masking

AUTHENSOR commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AUTHENSOR commented Jun 18, 2026

Description

Tests and Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant