FEAT: Always recompute ComponentIdentifier hashes by rlundeen2 · Pull Request #2050 · microsoft/PyRIT

rlundeen2 · 2026-06-18T20:10:41Z

Since identifiers now live only on AttackResult and are much smaller, remove the logic that stores and reloads the content hash and the value-truncation machinery. The content hash is always recomputed on validation, and full param values (system prompts, configs, etc.) are stored. eval_hash stays persisted for DB-level filtering but is recomputed on every reload rather than trusted from storage.

This simplifies storage, and allows us to see full identities, making the identifiers truly serializable/deserializable from the database

Since identifiers now live only on AttackResult and are much smaller, remove the logic that stores and reloads the content hash and the value-truncation machinery. The content hash is always recomputed on validation, and full param values (system prompts, configs, etc.) are stored. eval_hash stays persisted for DB-level filtering but is recomputed on every reload rather than trusted from storage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Fix broken import of removed MAX_IDENTIFIER_VALUE_LENGTH in atomic_attack.py - Remove max_value_length param from ComponentIdentifier.to_dict shim - Make eval_hash stamping unconditional (always recompute, never trust stored) at scorer and all memory write sites Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Drop any supplied hash before validation so it can only be computed from content; document the hash/eval_hash asymmetry (eval_hash is set solely via with_eval_hash). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

_dump_identifier/_dump_identifiers existed only to inject the now-removed truncation context. With truncation gone they were trivial model_dump() wrappers, so inline them at the call sites. Load helpers keep their real work (version injection + eval_hash re-stamping). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

behnam-o

looks good. just 1 comment on computed_field

also, we don't need to worry about existing data in the DB, right? no need to over them and recompute all their hashes? (with a migration script) - I think that's an overkill even if alters anything ... [actually, maybe we can't even do that, becase some stuff might be truncated with ... and not parsable to a dict]

Replaces the nullable hash pseudo-field (settable-but-ignored, str | None) with a @computed_field backed by a PrivateAttr cache, per PR review. The hash is now a read-only computed field: assignment raises (frozen), the type is a non-optional str, and it is computed once in the after-validator (no per-access recompute, so no perf regression on __hash__/__eq__/dedup). Incoming hash values are still dropped before validation so extra=forbid round-trips storage. Removes the now-unreachable hash-is-None guards in short_hash and compute_eval_hash. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

rlundeen2 changed the title ~~Always recompute ComponentIdentifier hashes; store identifiers in full~~ FEAT: Always recompute ComponentIdentifier hashes Jun 18, 2026

rlundeen2 and others added 3 commits June 18, 2026 13:28

Make ComponentIdentifier.hash non-settable

0dbe5c8

Drop any supplied hash before validation so it can only be computed from content; document the hash/eval_hash asymmetry (eval_hash is set solely via with_eval_hash). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

behnam-o approved these changes Jun 18, 2026

View reviewed changes

Comment thread pyrit/models/identifiers/component_identifier.py

rlundeen2 enabled auto-merge June 20, 2026 04:00

rlundeen2 added this pull request to the merge queue Jun 20, 2026

Merged via the queue into microsoft:main with commit 43a38e5 Jun 20, 2026
53 checks passed

rlundeen2 deleted the rlundeen2/component-identifier-refactor branch June 20, 2026 04:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Always recompute ComponentIdentifier hashes#2050

FEAT: Always recompute ComponentIdentifier hashes#2050
rlundeen2 merged 5 commits into
microsoft:mainfrom
rlundeen2:rlundeen2/component-identifier-refactor

rlundeen2 commented Jun 18, 2026

Uh oh!

behnam-o left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rlundeen2 commented Jun 18, 2026

Uh oh!

behnam-o left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

behnam-o left a comment •

edited

Loading