FEAT: Add 0DIN threat feed dataset loader (0din_threatfeed)#2034
Conversation
Add _ODINDataset, a remote seed-dataset loader for Mozilla's 0DIN.ai Jailbreak/Threat Feed API (0din.ai/api/v1/threatfeed). It paginates the feed, de-duplicates sample exploit prompts that repeat across tested models, and maps each to a SeedPrompt with taxonomy, severity, affected-model, and impact metadata. Filters (severity, security boundary, taxonomy category) are applied client-side since the API ignores server-side filter params. An optional include_variant_prompts flag emits the industry-specific variant prompts. Transient throttle/5xx responses are retried with backoff. Auth uses the 0DIN_API_KEY env var. Registers the loader and enums, adds the BibTeX/bibliography citation, updates the datasets notebook, and adds unit tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Set dataset_name to '0din' (the name users pass to load it), aligning with the existing local 0din_* datasets. Python identifiers (class _ODINDataset and the public enums) keep the ODIN spelling since identifiers cannot start with a digit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cache the raw threat feed under DB_DATA_PATH. Because reports are returned newest-first, subsequent fetches sync incrementally: pagination stops as soon as a previously-cached report UUID is seen, so only newly disclosed reports are downloaded and merged on top. Live-verified the second fetch drops from 9 requests to 1. cache=False forces a full refresh and bypasses the cache entirely. Adds caching unit tests plus an autouse fixture isolating the cache from the real DB_DATA_PATH. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Match the Agent Threat Rules convention: an empty security_boundaries or categories list now raises a clear ValueError (pass None to include all) instead of silently filtering everything out and surfacing the downstream 'SeedDataset cannot be empty' error. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… n-day datasets PR microsoft#1398 already added six static, hand-curated 0din_* local n-day disclosure datasets. This live API loader pulls the gated 0DIN threat feed, so name it 0din_threatfeed to clearly distinguish the dynamic full feed from the static disclosures and parallel the existing 0din_* naming. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Heads-up for reviewers: the Today the loader maps 0DIN's taxonomy categories ( Once the harm-categories standard lands, I expect to: move the taxonomy out of |
|
Thank you for building this, @romanlutz — it's great to see the live 0DIN Threat Feed land natively in PyRIT, and the loader is a faithful, careful read of the feed. Really appreciate you carrying this upstream. One note from the 0DIN side to support your provisional Fully agree with your read that the taxonomy axis (how an attack is structured) ≠ |
Per athal7's review on microsoft#2034: the 0DIN taxonomy values come from 0DIN's published taxonomy, grounded in 'Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming' (arXiv:2311.06237, PLoS ONE 2025). Add the BibTeX entry, cite it from the loader docstring alongside @Odin2024, link the public taxonomy, and reiterate that the taxonomy describes how an attack is structured rather than the harm it targets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks @athal7, great context. Done in db905f2: added a BibTeX entry for "Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming" (Inie, Stray, Derczynski; arXiv:2311.06237, PLoS ONE 2025) and cite it from the loader docstring alongside And yes please — I'll take you up on the offer to help map 0DIN |
Description
0DIN (0din.ai) is Mozilla's GenAI bug-bounty and threat-intelligence program. PyRIT already ships six static, hand-curated
0din_*"n-day" disclosure datasets (added in #1398), but those are a fixed snapshot of individual public disclosures. This PR adds a loader for the full live, gated 0DIN Threat Feed so red teamers can pull the complete, continuously-growing corpus of verified jailbreak reports (810+ reports / 1300+ sample prompts at time of writing) programmatically.It registers as
0din_threatfeedto clearly distinguish the dynamic full feed from the existing static0din_*n-day samples, and parallels that naming convention.Approach
_ODINDataset(_RemoteDatasetLoader)calls the/api/v1/threatfeedREST API, paginates the feed, and maps each report's sample exploit prompts toSeedPrompts carrying the literal attack text. Rich per-seed metadata is preserved (taxonomy category/strategy/technique, severity, security boundary, affected models, social-impact score, detection signature, disclosure date).0DIN_API_KEYenvironment variable (orapi_key=). Without a key the loader raises a clearValueError.severity,security_boundaries, andcategories(module-level enums, validated via the inherited_validate_enums; empty lists raise a clear error, passNoneto include all). The API ignores server-side filter params, so filtering is applied after fetch.DB_DATA_PATH(gitignored). Because reports are returned newest-first, subsequent fetches stop as soon as a previously-cached report UUID is seen, so only newly disclosed reports are downloaded and merged on top. Verified live: the second fetch drops from 9 requests to 1.cache=Falseforces a full refresh.include_variant_prompts=Trueflag additionally emits the large set of industry-specific variant prompts attached to each report.Notes for reviewers
cache=Falseforces a full refresh when that is needed.Tests and Documentation
Tests (
tests/unit/datasets/test_odin_dataset.py, 32 unit tests, all mocked, no network): init/enum validation including empty-list guards, fetch and field/metadata mapping, message de-duplication, each filter mode plus the empty-after-filterValueError, verbatim value preservation, variant inclusion, pagination, auth header shape, retry/backoff behavior, and the incremental cache (cold write, warm single-request sync, delta merge,cache=Falsebypass, corrupt-cache recovery). An autouse fixture isolates the cache from the realDB_DATA_PATH.I also ran the loader live against the real gated API (using a real
0DIN_API_KEY) as a manual integration check, mirroring the integration smoke-test assertions: full fetch, severity filter, category filter, and variant expansion all pass and return well-formedSeedPrompts. Gated datasets are deliberately excluded from the CI integration smoke set (no key in CI), consistent with PromptIntel and the HF-gated loaders.Documentation: added the
@odin2024BibTeX entry todoc/references.bibanddoc/bibliography.md, and added0din_threatfeedto the built-in datasets prose and name list indoc/code/datasets/1_loading_datasets.{py,ipynb}. JupyText: the datasets notebook name-list cell was updated to include0din_threatfeedin sorted position.