FEAT: Standardize system_prompt as a first-class consumed attack argument#2040
FEAT: Standardize system_prompt as a first-class consumed attack argument#2040adrian-gavrila wants to merge 5 commits into
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Behavior is grep-discoverable, runtime-enforced, and test-covered; the section did not clear the bar this slim instruction file sets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…prompt' into adrian-gavrila/standardize-system-prompt
There was a problem hiding this comment.
⚠️ Human review recommended
It changes a core execution chokepoint and parameter contract across many attacks, so a human reviewer should validate compatibility and any downstream behavioral impact beyond the added unit coverage.
Pull request overview
Standardizes system_prompt= as a first-class, consumed attack argument by lifting it into AttackParameters and reliably lowering it into a single leading system message at the shared AttackStrategy.execute_with_context_async entrypoint, ensuring delivery for both direct (execute_async) and executor-driven (AttackExecutor) runs.
Changes:
- Add
system_prompt: str | NonetoAttackParametersand lower it intocontext.prepended_conversationatAttackStrategy.execute_with_context_async, with a conflictValueErrorwhen a system-role prepended message is already present. - Remove the dead
SingleTurnAttackContext.system_promptfield and update tests to assert behavioral delivery rather than no-op state. - Explicitly exclude
system_promptfrom self-seeding / internally-constructed prompt attacks’params_typeand add unit coverage for rejection and executor-path regression.
File summaries
| File | Description |
|---|---|
pyrit/executor/attack/core/attack_parameters.py |
Adds system_prompt to the canonical attack parameter contract. |
pyrit/executor/attack/core/attack_strategy.py |
Lowers system_prompt into a prepended system message at the shared chokepoint and enforces conflict rules. |
pyrit/executor/attack/single_turn/single_turn_attack_strategy.py |
Removes unused SingleTurnAttackContext.system_prompt field. |
pyrit/executor/attack/single_turn/flip_attack.py |
Excludes system_prompt from a self-seeding attack’s accepted params. |
pyrit/executor/attack/single_turn/skeleton_key.py |
Excludes system_prompt from a self-seeding attack’s accepted params. |
pyrit/executor/attack/single_turn/many_shot_jailbreak.py |
Excludes system_prompt from a self-seeding attack’s accepted params. |
pyrit/executor/attack/single_turn/context_compliance.py |
Excludes system_prompt from a self-seeding attack’s accepted params. |
pyrit/executor/attack/single_turn/role_play.py |
Excludes system_prompt from a self-seeding attack’s accepted params. |
pyrit/executor/attack/compound/sequential_attack.py |
Excludes system_prompt from compound attack per-call overrides. |
tests/unit/executor/attack/core/test_attack_strategy.py |
Adds unit coverage for lowering behavior, ordering, conflict errors, and executor-bypass simulation. |
tests/unit/executor/attack/core/test_attack_executor.py |
Regression test ensuring executor-path lowering happens via the shared chokepoint. |
tests/unit/executor/attack/single_turn/test_prompt_sending.py |
Updates assertions to validate lowering behavior and adds delivery-to-conversation-manager test. |
tests/unit/executor/attack/single_turn/test_role_play.py |
Verifies system_prompt exclusion from params_type and explicit rejection at runtime. |
Copilot's findings
- Files reviewed: 13/13 changed files
- Comments generated: 0
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
| "| `objective` | What you are trying to get the **objective target** (the system under test) to do. Drives scoring and multi-turn adversarial prompts. |\n", | ||
| "| `memory_labels` | A `dict[str, str]` tagged onto every prompt/response, so you can filter this run later in memory. |\n", | ||
| "| `prepended_conversation` | A list of `Message`s to seed the conversation before the attack's own turns (system prompt, prior history). |\n", | ||
| "| `system_prompt` | The objective target's system prompt, as a string. The standard one-line way to set it; PyRIT lowers it to a single `system` message at the front of the conversation. Mutually exclusive with a `system` message in `prepended_conversation`. |\n", |
There was a problem hiding this comment.
This is the most common case BY FAR. But it's a bit complicated because there can be multiple system prompts. Different models behave differently in these cases, which is interesting to pyrit which has a goal of being flexible.
Because of that, I like it being in prepended_conversation. I think it maps more cleanly to SeedPromptAttackGroups. It is more difficult to add it when manually creating attacks, but I think SeedPrompts is the more common case.
Description
Makes
system_prompt=a first-class, consumed attack argument that always delivers to theobjective (system-under-test) target. Previously there were three inconsistent ways to set the
objective target's system prompt, including a
SingleTurnAttackContext.system_promptfield that wasdeclared but never consumed (a silent no-op).
This standardizes on a single mechanism:
system_promptis lifted toAttackParameters, so both single-turn and multi-turn attacks acceptit with one source of truth.
system-role message onprepended_conversationat theAttackStrategy.execute_with_context_asyncchokepoint — the one path that both single-shot(
execute_async) and batched (AttackExecutor) runs cross — so delivery is structurallyguaranteed and runs exactly once per task (outside the retry loop).
system_prompt=and asystem-role message inprepended_conversationraises aclear
ValueError(one source of truth).SingleTurnAttackContext.system_promptfield is removed.prepended_conversationfrom their params (flip_attack,skeleton_key,many_shot_jailbreak,context_compliance,role_play,sequential_attack)also exclude
system_prompt, rejecting it explicitly rather than silently dropping it.prepended_conversation=remains the advanced path for full multi-message seeds.Implements ADO #9697 (framework standardization track). The CoPyRIT GUI half of that story is tracked
separately and is not part of this PR.
Tests and Documentation
AttackExecutorbatch path(regression test for the executor-bypass case), end-to-end delivery to the target, the
both-supplied
ValueError, the self-seeding carve-outs, and single/multi-turn parity. Replaced theprevious
context.system_prompt == ...assertion (which locked in the no-op) with a behavioralassertion. Full unit suite green (9785 passed, 119 skipped).
doc/code/executor/3_attack_configuration(the attack-inputs page), alongside the existingprepended_conversationmaterial, plus asystem_promptrow in the inputs table. Regenerated thepaired notebook with JupyText (executed).