FEAT: Standardize system_prompt as a first-class consumed attack argument by adrian-gavrila · Pull Request #2040 · microsoft/PyRIT

adrian-gavrila · 2026-06-18T00:56:39Z

Description

Makes system_prompt= a first-class, consumed attack argument that always delivers to the
objective (system-under-test) target. Previously there were three inconsistent ways to set the
objective target's system prompt, including a SingleTurnAttackContext.system_prompt field that was
declared but never consumed (a silent no-op).

This standardizes on a single mechanism:

system_prompt is lifted to AttackParameters, so both single-turn and multi-turn attacks accept
it with one source of truth.
It is lowered to a single system-role message on prepended_conversation at the
AttackStrategy.execute_with_context_async chokepoint — the one path that both single-shot
(execute_async) and batched (AttackExecutor) runs cross — so delivery is structurally
guaranteed and runs exactly once per task (outside the retry loop).
Supplying both system_prompt= and a system-role message in prepended_conversation raises a
clear ValueError (one source of truth).
The dead SingleTurnAttackContext.system_prompt field is removed.
Self-seeding attacks that exclude prepended_conversation from their params (flip_attack,
skeleton_key, many_shot_jailbreak, context_compliance, role_play, sequential_attack)
also exclude system_prompt, rejecting it explicitly rather than silently dropping it.

prepended_conversation= remains the advanced path for full multi-message seeds.

Implements ADO #9697 (framework standardization track). The CoPyRIT GUI half of that story is tracked
separately and is not part of this PR.

Tests and Documentation

Tests: added/updated coverage for lowering at the chokepoint, the AttackExecutor batch path
(regression test for the executor-bypass case), end-to-end delivery to the target, the
both-supplied ValueError, the self-seeding carve-outs, and single/multi-turn parity. Replaced the
previous context.system_prompt == ... assertion (which locked in the no-op) with a behavioral
assertion. Full unit suite green (9785 passed, 119 skipped).
Documentation: added a "Setting a system prompt" section to
doc/code/executor/3_attack_configuration (the attack-inputs page), alongside the existing
prepended_conversation material, plus a system_prompt row in the inputs table. Regenerated the
paired notebook with JupyText (executed).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Behavior is grep-discoverable, runtime-enforced, and test-covered; the section did not clear the bar this slim instruction file sets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…prompt' into adrian-gavrila/standardize-system-prompt

Copilot

⚠️ Human review recommended

It changes a core execution chokepoint and parameter contract across many attacks, so a human reviewer should validate compatibility and any downstream behavioral impact beyond the added unit coverage.

Pull request overview

Standardizes system_prompt= as a first-class, consumed attack argument by lifting it into AttackParameters and reliably lowering it into a single leading system message at the shared AttackStrategy.execute_with_context_async entrypoint, ensuring delivery for both direct (execute_async) and executor-driven (AttackExecutor) runs.

Changes:

Add system_prompt: str | None to AttackParameters and lower it into context.prepended_conversation at AttackStrategy.execute_with_context_async, with a conflict ValueError when a system-role prepended message is already present.
Remove the dead SingleTurnAttackContext.system_prompt field and update tests to assert behavioral delivery rather than no-op state.
Explicitly exclude system_prompt from self-seeding / internally-constructed prompt attacks’ params_type and add unit coverage for rejection and executor-path regression.

File summaries

File	Description
`pyrit/executor/attack/core/attack_parameters.py`	Adds `system_prompt` to the canonical attack parameter contract.
`pyrit/executor/attack/core/attack_strategy.py`	Lowers `system_prompt` into a prepended system message at the shared chokepoint and enforces conflict rules.
`pyrit/executor/attack/single_turn/single_turn_attack_strategy.py`	Removes unused `SingleTurnAttackContext.system_prompt` field.
`pyrit/executor/attack/single_turn/flip_attack.py`	Excludes `system_prompt` from a self-seeding attack’s accepted params.
`pyrit/executor/attack/single_turn/skeleton_key.py`	Excludes `system_prompt` from a self-seeding attack’s accepted params.
`pyrit/executor/attack/single_turn/many_shot_jailbreak.py`	Excludes `system_prompt` from a self-seeding attack’s accepted params.
`pyrit/executor/attack/single_turn/context_compliance.py`	Excludes `system_prompt` from a self-seeding attack’s accepted params.
`pyrit/executor/attack/single_turn/role_play.py`	Excludes `system_prompt` from a self-seeding attack’s accepted params.
`pyrit/executor/attack/compound/sequential_attack.py`	Excludes `system_prompt` from compound attack per-call overrides.
`tests/unit/executor/attack/core/test_attack_strategy.py`	Adds unit coverage for lowering behavior, ordering, conflict errors, and executor-bypass simulation.
`tests/unit/executor/attack/core/test_attack_executor.py`	Regression test ensuring executor-path lowering happens via the shared chokepoint.
`tests/unit/executor/attack/single_turn/test_prompt_sending.py`	Updates assertions to validate lowering behavior and adds delivery-to-conversation-manager test.
`tests/unit/executor/attack/single_turn/test_role_play.py`	Verifies `system_prompt` exclusion from params_type and explicit rejection at runtime.

Copilot's findings

Files reviewed: 13/13 changed files
Comments generated: 0

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.

rlundeen2 · 2026-06-19T21:14:18Z

    "| `objective` | What you are trying to get the **objective target** (the system under test) to do. Drives scoring and multi-turn adversarial prompts. |\n",
    "| `memory_labels` | A `dict[str, str]` tagged onto every prompt/response, so you can filter this run later in memory. |\n",
-    "| `prepended_conversation` | A list of `Message`s to seed the conversation before the attack's own turns (system prompt, prior history). |\n",
+    "| `system_prompt` | The objective target's system prompt, as a string. The standard one-line way to set it; PyRIT lowers it to a single `system` message at the front of the conversation. Mutually exclusive with a `system` message in `prepended_conversation`. |\n",


This is the most common case BY FAR. But it's a bit complicated because there can be multiple system prompts. Different models behave differently in these cases, which is interesting to pyrit which has a goal of being flexible.

Because of that, I like it being in prepended_conversation. I think it maps more cleanly to SeedPromptAttackGroups. It is more difficult to add it when manually creating attacks, but I think SeedPrompts is the more common case.

adrian-gavrila and others added 3 commits June 17, 2026 20:43

Standardize system_prompt as a first-class consumed attack argument

eabd84b

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove system_prompt section from attacks instructions

ccbbb4a

Behavior is grep-discoverable, runtime-enforced, and test-covered; the section did not clear the bar this slim instruction file sets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge branch 'main' into adrian-gavrila/standardize-system-prompt

21d7f9a

adrian-gavrila marked this pull request as ready for review June 18, 2026 13:08

adrian-gavrila requested a review from Copilot June 18, 2026 13:10

Copilot started reviewing on behalf of adrian-gavrila June 18, 2026 13:11 View session

adrian-gavrila and others added 2 commits June 18, 2026 09:17

Add system_prompt example to attack configuration doc

1f08821

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge remote-tracking branch 'fork/adrian-gavrila/standardize-system-…

7508400

…prompt' into adrian-gavrila/standardize-system-prompt

Copilot AI reviewed Jun 18, 2026

View reviewed changes

adrian-gavrila changed the title ~~[DRAFT] FEAT: Standardize system_prompt as a first-class consumed attack argument~~ FEAT: Standardize system_prompt as a first-class consumed attack argument Jun 18, 2026

adrian-gavrila mentioned this pull request Jun 19, 2026

FEAT: set the objective target's system prompt from the CoPyRIT GUI #2056

Open

rlundeen2 reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Standardize system_prompt as a first-class consumed attack argument#2040

FEAT: Standardize system_prompt as a first-class consumed attack argument#2040
adrian-gavrila wants to merge 5 commits into
microsoft:mainfrom
adrian-gavrila:adrian-gavrila/standardize-system-prompt

adrian-gavrila commented Jun 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

rlundeen2 Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

adrian-gavrila commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests and Documentation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

⚠️ Human review recommended

Copilot's findings

Uh oh!

rlundeen2 Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adrian-gavrila commented Jun 18, 2026 •

edited

Loading