Add tutorial: Multi-Vector LLM Safety Bypass: Field Observations from Affected Chatbot Deployments by onurcangnc · Pull Request #45 · GenAI-Security-Project/GenAI-Red-Team-Lab

onurcangnc · 2026-06-20T10:06:26Z

New Tutorial: Multi-Vector LLM Safety Bypass

Adds a field-observations tutorial under tutorials/, documenting four
distinct classes of LLM safety bypass observed against production chatbot
deployments during independent testing.

Format

This follows the same structure and scope as the existing
tutorials/llm_chatbot_system_prompt_exfiltration.md: header metadata →
overview → OWASP mapping → per-stage Technique/Why-It-Works/Mitigation →
full attack-chain summary → consolidated mitigation checklist → references.

Reproducibility note

These are field observations from production systems, not sandbox-based
exploits. The techniques were observed against live third-party deployments
under responsible disclosure (reported via MITRE), so they cannot be packaged
as a containerized reproduction the way an exploitation/ example or sandbox
can. This is consistent with the existing system-prompt-exfiltration tutorial,
which is likewise a writeup of observations against a non-sandboxed target.
The document is fully anonymized no vendor names, no proprietary system
identifiers, no harmful payloads or restricted-content output.

If the maintainers would prefer a runnable companion, I'm happy to add a
follow-up exploitation/ example that adapts the prompt-injection stages
(control-token injection, role-label spoofing, Crescendo escalation) into a
config.toml prompt list against the existing sandboxes/llm_local
environment.

Coverage

Stage 2 : ChatML control-token injection (unsafe prompt templating)
Stage 3 : Role-label spoofing via application-specific Markdown parsing
Stage 4 : Role-switching documentation pretext (leakage vs. hallucination distinction)
Stage 5 : Crescendo-inspired multi-turn escalation with obfuscation
Stage 6 : Output-moderation architecture & quantization risks

Mapping

OWASP: LLM01:2025, LLM02:2025, LLM05:2025, LLM07:2025
MITRE ATLAS: AML.T0051, AML.T0054, AML.T0057
CWE: CWE-1427, CWE-693, CWE-94

Claims are scoped to affected deployments throughout; the tutorial
explicitly notes that not every deployment is vulnerable to every technique.

onurcangnc added 3 commits June 20, 2026 12:36

Add tutorial: Multi-Vector LLM Safety Bypass

b188956

Rename multi-turn safety bypass tutorial

7a9e18d

Update multi-turn safety bypass tutorial

67351e8

onurcangnc requested review from felipepenha and rossja as code owners June 20, 2026 10:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tutorial: Multi-Vector LLM Safety Bypass: Field Observations from Affected Chatbot Deployments#45

Add tutorial: Multi-Vector LLM Safety Bypass: Field Observations from Affected Chatbot Deployments#45
onurcangnc wants to merge 3 commits into
GenAI-Security-Project:mainfrom
onurcangnc:tutorial/multi-vector-llm-safety-bypass

onurcangnc commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

onurcangnc commented Jun 20, 2026

New Tutorial: Multi-Vector LLM Safety Bypass

Format

Reproducibility note

Coverage

Mapping

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant