Skip to content

Add tutorial: Multi-Vector LLM Safety Bypass: Field Observations from Affected Chatbot Deployments#45

Open
onurcangnc wants to merge 3 commits into
GenAI-Security-Project:mainfrom
onurcangnc:tutorial/multi-vector-llm-safety-bypass
Open

Add tutorial: Multi-Vector LLM Safety Bypass: Field Observations from Affected Chatbot Deployments#45
onurcangnc wants to merge 3 commits into
GenAI-Security-Project:mainfrom
onurcangnc:tutorial/multi-vector-llm-safety-bypass

Conversation

@onurcangnc

Copy link
Copy Markdown

New Tutorial: Multi-Vector LLM Safety Bypass

Adds a field-observations tutorial under tutorials/, documenting four
distinct classes of LLM safety bypass observed against production chatbot
deployments during independent testing.

Format

This follows the same structure and scope as the existing
tutorials/llm_chatbot_system_prompt_exfiltration.md: header metadata →
overview → OWASP mapping → per-stage Technique/Why-It-Works/Mitigation →
full attack-chain summary → consolidated mitigation checklist → references.

Reproducibility note

These are field observations from production systems, not sandbox-based
exploits
. The techniques were observed against live third-party deployments
under responsible disclosure (reported via MITRE), so they cannot be packaged
as a containerized reproduction the way an exploitation/ example or sandbox
can. This is consistent with the existing system-prompt-exfiltration tutorial,
which is likewise a writeup of observations against a non-sandboxed target.
The document is fully anonymized no vendor names, no proprietary system
identifiers, no harmful payloads or restricted-content output.

If the maintainers would prefer a runnable companion, I'm happy to add a
follow-up exploitation/ example that adapts the prompt-injection stages
(control-token injection, role-label spoofing, Crescendo escalation) into a
config.toml prompt list against the existing sandboxes/llm_local
environment.

Coverage

  • Stage 2 : ChatML control-token injection (unsafe prompt templating)
  • Stage 3 : Role-label spoofing via application-specific Markdown parsing
  • Stage 4 : Role-switching documentation pretext (leakage vs. hallucination distinction)
  • Stage 5 : Crescendo-inspired multi-turn escalation with obfuscation
  • Stage 6 : Output-moderation architecture & quantization risks

Mapping

  • OWASP: LLM01:2025, LLM02:2025, LLM05:2025, LLM07:2025
  • MITRE ATLAS: AML.T0051, AML.T0054, AML.T0057
  • CWE: CWE-1427, CWE-693, CWE-94

Claims are scoped to affected deployments throughout; the tutorial
explicitly notes that not every deployment is vulnerable to every technique.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant