Skip to content

FEAT: Add 0DIN threat feed dataset loader (0din_threatfeed)#2034

Merged
romanlutz merged 7 commits into
microsoft:mainfrom
romanlutz:romanlutz/odin-dataset-loader
Jun 19, 2026
Merged

FEAT: Add 0DIN threat feed dataset loader (0din_threatfeed)#2034
romanlutz merged 7 commits into
microsoft:mainfrom
romanlutz:romanlutz/odin-dataset-loader

Conversation

@romanlutz

Copy link
Copy Markdown
Contributor

Description

0DIN (0din.ai) is Mozilla's GenAI bug-bounty and threat-intelligence program. PyRIT already ships six static, hand-curated 0din_* "n-day" disclosure datasets (added in #1398), but those are a fixed snapshot of individual public disclosures. This PR adds a loader for the full live, gated 0DIN Threat Feed so red teamers can pull the complete, continuously-growing corpus of verified jailbreak reports (810+ reports / 1300+ sample prompts at time of writing) programmatically.

It registers as 0din_threatfeed to clearly distinguish the dynamic full feed from the existing static 0din_* n-day samples, and parallels that naming convention.

Approach

  • _ODINDataset(_RemoteDatasetLoader) calls the /api/v1/threatfeed REST API, paginates the feed, and maps each report's sample exploit prompts to SeedPrompts carrying the literal attack text. Rich per-seed metadata is preserved (taxonomy category/strategy/technique, severity, security boundary, affected models, social-impact score, detection signature, disclosure date).
  • Gated access, consistent with the PromptIntel and HuggingFace-token loaders: no prompt content is committed to the repo. Users supply their own key via the 0DIN_API_KEY environment variable (or api_key=). Without a key the loader raises a clear ValueError.
  • Client-side filtering by severity, security_boundaries, and categories (module-level enums, validated via the inherited _validate_enums; empty lists raise a clear error, pass None to include all). The API ignores server-side filter params, so filtering is applied after fetch.
  • Incremental on-disk cache. The raw feed is cached under DB_DATA_PATH (gitignored). Because reports are returned newest-first, subsequent fetches stop as soon as a previously-cached report UUID is seen, so only newly disclosed reports are downloaded and merged on top. Verified live: the second fetch drops from 9 requests to 1. cache=False forces a full refresh.
  • Resilience. Sample prompts that repeat across multiple tested models are de-duplicated. Transient throttle/5xx responses (the API uses a 25 req/min limit and returns 406/429/5xx under load) are retried with backoff. Remote text is never rendered through Jinja (untrusted-input safety).
  • An opt-in include_variant_prompts=True flag additionally emits the large set of industry-specific variant prompts attached to each report.

Notes for reviewers

  • The 0DIN feed is a live, growing service, so loads are intentionally non-deterministic (you always get the latest data) rather than a fixed benchmark. This is by design for a threat feed.
  • The incremental cache syncs newly disclosed reports but does not re-pull edits to already-cached reports; cache=False forces a full refresh when that is needed.

Tests and Documentation

Tests (tests/unit/datasets/test_odin_dataset.py, 32 unit tests, all mocked, no network): init/enum validation including empty-list guards, fetch and field/metadata mapping, message de-duplication, each filter mode plus the empty-after-filter ValueError, verbatim value preservation, variant inclusion, pagination, auth header shape, retry/backoff behavior, and the incremental cache (cold write, warm single-request sync, delta merge, cache=False bypass, corrupt-cache recovery). An autouse fixture isolates the cache from the real DB_DATA_PATH.

I also ran the loader live against the real gated API (using a real 0DIN_API_KEY) as a manual integration check, mirroring the integration smoke-test assertions: full fetch, severity filter, category filter, and variant expansion all pass and return well-formed SeedPrompts. Gated datasets are deliberately excluded from the CI integration smoke set (no key in CI), consistent with PromptIntel and the HF-gated loaders.

Documentation: added the @odin2024 BibTeX entry to doc/references.bib and doc/bibliography.md, and added 0din_threatfeed to the built-in datasets prose and name list in doc/code/datasets/1_loading_datasets.{py,ipynb}. JupyText: the datasets notebook name-list cell was updated to include 0din_threatfeed in sorted position.

Copilot AI added 5 commits June 15, 2026 20:26
Add _ODINDataset, a remote seed-dataset loader for Mozilla's 0DIN.ai Jailbreak/Threat Feed API (0din.ai/api/v1/threatfeed). It paginates the feed, de-duplicates sample exploit prompts that repeat across tested models, and maps each to a SeedPrompt with taxonomy, severity, affected-model, and impact metadata.

Filters (severity, security boundary, taxonomy category) are applied client-side since the API ignores server-side filter params. An optional include_variant_prompts flag emits the industry-specific variant prompts. Transient throttle/5xx responses are retried with backoff. Auth uses the 0DIN_API_KEY env var.

Registers the loader and enums, adds the BibTeX/bibliography citation, updates the datasets notebook, and adds unit tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Set dataset_name to '0din' (the name users pass to load it), aligning with the existing local 0din_* datasets. Python identifiers (class _ODINDataset and the public enums) keep the ODIN spelling since identifiers cannot start with a digit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cache the raw threat feed under DB_DATA_PATH. Because reports are returned newest-first, subsequent fetches sync incrementally: pagination stops as soon as a previously-cached report UUID is seen, so only newly disclosed reports are downloaded and merged on top. Live-verified the second fetch drops from 9 requests to 1. cache=False forces a full refresh and bypasses the cache entirely. Adds caching unit tests plus an autouse fixture isolating the cache from the real DB_DATA_PATH.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Match the Agent Threat Rules convention: an empty security_boundaries or categories list now raises a clear ValueError (pass None to include all) instead of silently filtering everything out and surfacing the downstream 'SeedDataset cannot be empty' error.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… n-day datasets

PR microsoft#1398 already added six static, hand-curated 0din_* local n-day disclosure datasets. This live API loader pulls the gated 0DIN threat feed, so name it 0din_threatfeed to clearly distinguish the dynamic full feed from the static disclosures and parallel the existing 0din_* naming.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@romanlutz

Copy link
Copy Markdown
Contributor Author

Heads-up for reviewers: the harm_categories handling in this loader is provisional and will be revisited to align with the in-flight harm-categories standardization effort.

Today the loader maps 0DIN's taxonomy categories (stratagems, fictionalizing, language, rhetoric, possible_worlds) into harm_categories (both class-level and per-seed). But those taxonomy values are jailbreak technique families (how the attack works), not traditional harm categories (what harm). The actual harm signal in 0DIN data lives in test_results[].test_type.name (e.g. harmful_substances, fentanyl, cbrm, chinese_censorship, illicit_substances), which is currently surfaced in metadata rather than harm_categories.

Once the harm-categories standard lands, I expect to: move the taxonomy out of harm_categories (into metadata/tags, where taxonomy_categories already lives), derive real harm categories from the test_type signal, and align the class-level harm_categories with the standardized vocabulary. Flagging so this isn't treated as final and so the two efforts stay coordinated.

@athal7

athal7 commented Jun 18, 2026

Copy link
Copy Markdown

Thank you for building this, @romanlutz — it's great to see the live 0DIN Threat Feed land natively in PyRIT, and the loader is a faithful, careful read of the feed. Really appreciate you carrying this upstream.

One note from the 0DIN side to support your provisional harm_categories flag: the taxonomy values (stratagems, fictionalizing, language, rhetoric, possible_worlds) come from 0DIN's published taxonomy, which is grounded in "Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming" (arXiv:2311.06237) — worth citing alongside @odin2024. Two public, no-auth references if useful: the taxonomy at https://0din.ai/research/taxonomy and the JSON at https://0din.ai/research/taxonomies.

Fully agree with your read that the taxonomy axis (how an attack is structured) ≠ harm_categories (what harm results). Happy to help map 0DIN severity/security_boundary into harm_categories when that standard lands — just say the word.

Per athal7's review on microsoft#2034: the 0DIN taxonomy values come from 0DIN's published taxonomy, grounded in 'Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming' (arXiv:2311.06237, PLoS ONE 2025). Add the BibTeX entry, cite it from the loader docstring alongside @Odin2024, link the public taxonomy, and reiterate that the taxonomy describes how an attack is structured rather than the harm it targets.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@romanlutz

Copy link
Copy Markdown
Contributor Author

Thanks @athal7, great context. Done in db905f2: added a BibTeX entry for "Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming" (Inie, Stray, Derczynski; arXiv:2311.06237, PLoS ONE 2025) and cite it from the loader docstring alongside @odin2024, with a link to the public taxonomy (https://0din.ai/research/taxonomy) and a note that the taxonomy axis describes how an attack is structured rather than the harm it targets.

And yes please — I'll take you up on the offer to help map 0DIN severity/security_boundary (and the test_type signal) into harm_categories once the standardization effort lands. I'll loop you in then.

@romanlutz romanlutz enabled auto-merge June 19, 2026 23:08
@romanlutz romanlutz added this pull request to the merge queue Jun 19, 2026
Merged via the queue into microsoft:main with commit 8376a81 Jun 19, 2026
53 checks passed
@romanlutz romanlutz deleted the romanlutz/odin-dataset-loader branch June 19, 2026 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants