Skip to content

FEAT: Always recompute ComponentIdentifier hashes#2050

Merged
rlundeen2 merged 5 commits into
microsoft:mainfrom
rlundeen2:rlundeen2/component-identifier-refactor
Jun 20, 2026
Merged

FEAT: Always recompute ComponentIdentifier hashes#2050
rlundeen2 merged 5 commits into
microsoft:mainfrom
rlundeen2:rlundeen2/component-identifier-refactor

Conversation

@rlundeen2

Copy link
Copy Markdown
Contributor

Since identifiers now live only on AttackResult and are much smaller, remove the logic that stores and reloads the content hash and the value-truncation machinery. The content hash is always recomputed on validation, and full param values (system prompts, configs, etc.) are stored. eval_hash stays persisted for DB-level filtering but is recomputed on every reload rather than trusted from storage.

This simplifies storage, and allows us to see full identities, making the identifiers truly serializable/deserializable from the database

Since identifiers now live only on AttackResult and are much smaller, remove
the logic that stores and reloads the content hash and the value-truncation
machinery. The content hash is always recomputed on validation, and full
param values (system prompts, configs, etc.) are stored. eval_hash stays
persisted for DB-level filtering but is recomputed on every reload rather
than trusted from storage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rlundeen2 rlundeen2 changed the title Always recompute ComponentIdentifier hashes; store identifiers in full FEAT: Always recompute ComponentIdentifier hashes Jun 18, 2026
rlundeen2 and others added 3 commits June 18, 2026 13:28
- Fix broken import of removed MAX_IDENTIFIER_VALUE_LENGTH in atomic_attack.py

- Remove max_value_length param from ComponentIdentifier.to_dict shim

- Make eval_hash stamping unconditional (always recompute, never trust stored) at scorer and all memory write sites

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Drop any supplied hash before validation so it can only be computed from content; document the hash/eval_hash asymmetry (eval_hash is set solely via with_eval_hash).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
_dump_identifier/_dump_identifiers existed only to inject the now-removed truncation context. With truncation gone they were trivial model_dump() wrappers, so inline them at the call sites. Load helpers keep their real work (version injection + eval_hash re-stamping).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@behnam-o behnam-o left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. just 1 comment on computed_field

also, we don't need to worry about existing data in the DB, right? no need to over them and recompute all their hashes? (with a migration script) - I think that's an overkill even if alters anything ... [actually, maybe we can't even do that, becase some stuff might be truncated with ... and not parsable to a dict]

Comment thread pyrit/models/identifiers/component_identifier.py
Replaces the nullable hash pseudo-field (settable-but-ignored, str | None) with a @computed_field backed by a PrivateAttr cache, per PR review. The hash is now a read-only computed field: assignment raises (frozen), the type is a non-optional str, and it is computed once in the after-validator (no per-access recompute, so no perf regression on __hash__/__eq__/dedup). Incoming hash values are still dropped before validation so extra=forbid round-trips storage. Removes the now-unreachable hash-is-None guards in short_hash and compute_eval_hash.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rlundeen2 rlundeen2 enabled auto-merge June 20, 2026 04:00
@rlundeen2 rlundeen2 added this pull request to the merge queue Jun 20, 2026
Merged via the queue into microsoft:main with commit 43a38e5 Jun 20, 2026
53 checks passed
@rlundeen2 rlundeen2 deleted the rlundeen2/component-identifier-refactor branch June 20, 2026 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants