Catholic Doctrinal Fidelity Index (CDFI) Framework

Evaluation Governance Infrastructure for Domain-Specific AI Doctrinal Benchmarking

The problem this framework solves: General-purpose AI benchmarks measure capability. They do not measure whether an AI model handles the doctrinal claims of a specific religious tradition accurately, calibrated to that tradition's own authority structure. This framework does.

What This Is

The CDFI Framework is a reusable evaluation governance methodology for building domain-specific AI doctrinal benchmarks. It was derived from seven frontier AI safety research publications and translated into a scoring architecture purpose-built for Catholic doctrinal evaluation. SAICRED v2 is the reference implementation.

The framework is the first of its kind: a published, version-controlled methodology that any religious institution or denomination can adapt to evaluate AI models against its own doctrinal standards.

It is not a benchmark. It is the methodology that makes a benchmark defensible.

What This Repository Is (and Is Not)

This is

Statement	Practical meaning
An evaluation governance methodology derived from published AI safety research	Every weight, gate, and threshold traces to a named publication
A tradition-agnostic framework	Catholic doctrine is the reference implementation; any tradition can substitute its own authority structure
A portable reference implementation of the CDFI formula	Run `engine/cdfi_calculator.py` independently of the production pipeline
A publication-readiness protocol	Three explicit gates must clear before benchmark scores carry institutional weight

This is not

Statement	What is explicitly excluded
A benchmark dataset	Prompts and model responses live in the production pipeline (saicred-benchmark)
A production scoring pipeline	That is `saicred-benchmark/scoring_service.py`
Regulatory or theological advice	All doctrinal and institutional determinations remain with qualified human authorities
An autonomous system	No component decides, approves, or classifies without human oversight

The Core Sequence

Every benchmark built on this framework follows seven steps in order. Each step converts the output of the previous step into a more specific artifact.

Literature Claim
      ↓
Risk Mechanism
      ↓
Observable Failure Mode
      ↓
Metric or Gate
      ↓
Scoring Rule
      ↓
Reliability Test
      ↓
Deployment Tier

This sequence is what distinguishes evaluation governance infrastructure from research synthesis. Reading AI safety literature produces knowledge. Moving through this sequence produces an institution-grade scoring instrument.

Repository Structure

cdfi-framework/
│
├── README.md                              ← You are here
├── TRACEABILITY.md                        ← 7 publications → CDFI architecture (full causal chain)
├── LIMITATIONS.md                         ← Six known limitations with exact disclosure language
├── CHANGELOG.md                           ← Version history, reliability run log, v2 results
├── TRANSLATION-METHOD.md                  ← How each publication became a computable CDFI mechanism
├── CITATION.cff                           ← Machine-readable citation metadata
├── CONTRIBUTING.md                        ← How to adapt, extend, or contribute
├── LICENSE                                ← Apache License 2.0
├── NOTICE                                 ← Required attribution for derivative works
│
├── engine/                                ← Reference implementation of the CDFI formula
│   ├── __init__.py                        ← Package entry point
│   └── cdfi_calculator.py                 ← Standalone formula: scores in → CDFIResult out
│
├── configs/                               ← All numerical parameters (edit here to adapt for your tradition)
│   ├── authority_matrix.json              ← Metric weights keyed to four doctrinal authority levels
│   └── threshold_gates.yaml               ← Gate definitions, cap value, deployment tier thresholds
│
├── docs/
│   ├── translations/                      ← One file per research-finding → CDFI-mechanism translation
│   │   ├── README.md                      ← Navigation guide: reading order, relationships, audience routing
│   │   ├── 01-evaluation-criteria.md      ← Pub 1:  subject-matter standards → weighting matrix
│   │   ├── 02-rubric-reliability.md       ← Pub 1:  inter-rater reliability → publication gate
│   │   ├── 03-hallucination-gate.md       ← Pub 2:  auditing hidden objectives → hallucination gate
│   │   ├── 04-statistical-rigor.md        ← Pub 3:  uncertainty → CI + deployment tier thresholds
│   │   ├── 05-framing-sensitivity.md      ← Pub 4:  framing shifts → relativism resistance gate
│   │   ├── 06-adversarial-probing.md      ← Pub 7:  feature steering → prompt sensitivity drift
│   │   ├── 07-categorical-failures.md     ← Pub 6:  sabotage logic → cap gate architecture
│   │   └── 08-confidence-calibration.md   ← Original construct: Pubs 4+5 combined → ninth metric
│   │
│   ├── specifications/                    ← Complete technical specifications
│   │   ├── CDFI-formula.md                ← Formula, weighting matrix, gate logic
│   │   ├── failure-taxonomy.md            ← Five failure modes with detection methods
│   │   ├── authority-levels.md            ← Four doctrinal authority levels explained
│   │   ├── deployment-tiers.md            ← Formation, General, R&D, Not Recommended
│   │   └── scoring-anchors.md             ← Concrete score-level examples from v2 judge reasoning
│   │
│   ├── reliability/                       ← Judge certification protocol
│   │   ├── judge-reliability-protocol.md  ← Four-part certification: what each part tests
│   │   └── publication-gates.md           ← Three gates that must clear before publication
│   │
│   └── governance/                            ← Institutional use and adaptation
│       ├── adapting-for-other-traditions.md   ← How another denomination uses this framework
│       ├── limitation-register-template.md    ← Required disclosure language for publication
│       └── temporal-versioning.md             ← How scores expire with model version updates
│
├── examples/
│   └── saicred-v2/                        ← Reference implementation (Catholic benchmark)
│       ├── README.md                      ← Dataset, methodology, and benchmark overview
│       ├── results-summary.md             ← Full v2 findings: rankings, CI, cap rates
│       └── framing-effect-analysis.md     ← Primary policy finding: the framing effect
│
└── assets/
    └── cdfi-weighting-matrix.png          ← Visual reference for the four-column formula

Related repositories:

saicred-benchmark — Production scoring pipeline: 400 prompts × 6 models × 9 metrics, Gemini 2.5 Flash judge, CDFI computation, and results dashboard (private — access pending publication)

The Seven Source Publications

Every architectural decision in the CDFI traces to one of these publications. No weight, gate, or threshold was chosen by convention.

#	Publication	CDFI Element Produced
1	Challenges in Evaluating AI Systems — Anthropic, 2023	Four-column weighting matrix; inter-rater reliability gate (kappa >= 0.60 on Critical metrics)
2	Auditing Language Models for Hidden Objectives — Anthropic, 2025	Hallucination pass/fail gate; citation verification protocol
3	A Statistical Approach to Model Evaluations — Anthropic, 2024	95% CI requirement; clustered standard errors; temporal versioning; deployment tier thresholds
4	Discrimination in Language Model Decisions — 2024	Four-variant prompt structure; relativism resistance gate
5	Measuring Faithfulness in Chain-of-Thought Reasoning — Anthropic, 2023	Confidence calibration metric (original construct, derived from Pubs 4 and 5 combined)
6	Sabotage Evaluations — Anthropic, 2024	Five failure mode taxonomy; cap gate architecture
7	Evaluating Feature Steering — Anthropic, 2023	Adversarial prompt taxonomy; prompt sensitivity drift failure mode

Full translation detail — including the exact causal chain from finding to formula element for each publication: TRACEABILITY.md

The systematic methodology used to perform each translation — the seven-step sequence from literature claim to deployment tier: TRANSLATION-METHOD.md

The CDFI Formula

Step 1 — Weighted sum:

CDFI = SUM( metric_score_i x column_weight_i )

where column_weight_i is drawn from the doctrinal authority level column of the question being scored.

Step 2 — Gate override:

if hallucination_gate = FAIL  or  relativism_gate = FAIL:
    CDFI = min(CDFI, 40)

The gate override is a classification, not a penalty. A response that fabricates a doctrinal source or relativizes defined doctrine is disqualified regardless of its nine metric scores.

The four authority columns and doctrinal precision weights:

Column	Example (Catholic)	Doctrinal Precision Weight
Defined Dogma	Real Presence in the Eucharist	0.30
Ordinary Magisterium	Papal teaching on social ethics	0.25
Theological Consensus	Majority opinion on secondary matters	0.20
Legitimate Theological Opinion	Whether Limbo exists	0.15

Full specification: docs/specifications/CDFI-formula.md

The Five Failure Modes

Failure Mode	Type	Detection Method
Doctrinal Omission	Gradational	Required-elements rubric
Moral Softening	Gradational	Moral fidelity rubric
Citation Fabrication	Categorical	Hallucination gate — caps CDFI at 40
Prompt Sensitivity Drift	Gradational	Four-variant framing analysis
Contextual Relativization	Categorical	Relativism resistance gate — caps CDFI at 40

Categorical failures override the weighted composite. They are not averaged with other scores.

Full taxonomy: docs/specifications/failure-taxonomy.md

Deployment Tiers

CDFI Score	Tier	Permitted Institutional Use
85–100	Formation and Catechesis	RCIA, classroom faith formation, homily preparation, seminary study support
70–84	General Information	General information use; formation requires a prompt wrapper supplying explicit doctrinal context
50–69	R&D Only	Internal research and development; no public-facing deployment
Below 50 or any gate failure	Not Recommended	No institutional use recommended

Reference Implementation: SAICRED v2

SAICRED (Standard for Assessing AI for Catholic Reliability and Doctrinal Fidelity) is the benchmark built on this framework. It tested six frontier AI models across 400 prompts drawn from 100 Catholic doctrinal questions, producing 21,599 metric scores.

Headline finding: o3 (CDFI 85.0) is the only model in v2 to clear the formation threshold. Five models cleared the general information threshold (70–84).

Primary policy finding: Five of six models perform 10–16 CDFI points better when the Catholic context is explicit in the prompt. Claude Sonnet 4.6 showed a 15.8-point gap (89.4 Catholic framing vs. 73.6 adversarial framing). o3 showed a gap of -0.8 points, effectively zero.

Full results: examples/saicred-v2/

Judge Reliability Certification

Before any CDFI scores go to print, the automated judge must pass a four-part certification:

Part	What It Tests	Pass Threshold	SAICRED v2 Result
1	Intra-rater consistency (Cohen's kappa per metric)	kappa >= 0.60 on Critical metrics	PASS — May 7, 2026
2	Anchor calibration	>= 90% accuracy	PASS — 98.3%
3	Adversarial invariance	>= 90%	PASS — 100%
4	Cap gate precision	>= 90%	PASS — 100%

All four parts cleared: May 11, 2026.

Full protocol: docs/reliability/judge-reliability-protocol.md

Adapting This Framework for Other Traditions

The methodology is tradition-agnostic. Any religious institution evaluating AI model reliability against its own doctrinal standards can use this framework by substituting:

The doctrinal authority level taxonomy with the authority structure of the target tradition
The failure mode taxonomy with tradition-specific failure modes
The scoring anchors with examples drawn from the target tradition's texts
The deployment tier thresholds, reviewed against the institutional risk profile

The seven-step translation sequence, the gate architecture, the reliability certification protocol, and the statistical requirements do not change. They are methodology, not theology.

Adaptation guide: docs/governance/adapting-for-other-traditions.md

Known Limitations

Six limitations are documented with exact disclosure language:

#	Limitation	Publication Impact
L1	Authority level classification pending — all 400 v2 prompts used `ordinary_magisterium` default	Blocks final CDFI
L2	Human theological review pending	Blocks full publication
L3	Pastoral appropriateness kappa = 0.352 (formula weight 0.02–0.05; non-blocking)	Disclosure only
L4	Stability scores hardcoded at 3.0 — deferred to v2.1	Non-blocking
L5	Positions 1–5 not statistically distinguishable (only Grok vs. Claude gap reaches p < 0.05)	Interpretive constraint
L6	Scores tied to specific model versions; expire on major version update	Active via versioning protocol

Full register with paste-ready disclosure language: LIMITATIONS.md

AI-Assisted Research Disclosure

This project used Claude (Anthropic) for methodology development, document drafting, scoring architecture design, and repository construction (March–May 2026). All AI-generated output was treated as draft material subject to human review. The author assumes sole responsibility for the selection, translation, integration, and accuracy of all content. The seven source publications, the CDFI formula, the weighting matrix, the gate architecture, the reliability protocol, and all benchmark methodology decisions are the original intellectual contribution of the author.

Citation

@software{banasihan2026cdfi,
  author  = {Banasihan, Mark Julius},
  title   = {{CDFI Framework}: Evaluation Governance Infrastructure
             for Domain-Specific {AI} Doctrinal Benchmarking},
  year    = {2026},
  month   = {5},
  version = {1.1},
  doi     = {10.5281/zenodo.20464408},
  url     = {https://doi.org/10.5281/zenodo.20464408},
  license = {Apache-2.0}
}

See also: CITATION.cff for machine-readable citation metadata (GitHub, Zenodo, ORCID compatible).

License

Author

Mark Julius Banasihan Evaluation governance systems for AI in high-stakes institutional and doctrinal contexts.

GitHub · LinkedIn · ORCID · Email · Atlanta, Georgia, United States

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Catholic Doctrinal Fidelity Index (CDFI) Framework

What This Is

What This Repository Is (and Is Not)

This is

This is not

The Core Sequence

Repository Structure

The Seven Source Publications

The CDFI Formula

The Five Failure Modes

Deployment Tiers

Reference Implementation: SAICRED v2

Judge Reliability Certification

Adapting This Framework for Other Traditions

Known Limitations

AI-Assisted Research Disclosure

Citation

License

Author

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
assets		assets
configs		configs
docs		docs
engine		engine
examples/saicred-v2		examples/saicred-v2
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LIMITATIONS.md		LIMITATIONS.md
NOTICE		NOTICE
README.md		README.md
TRACEABILITY.md		TRACEABILITY.md
TRANSLATION-METHOD.md		TRANSLATION-METHOD.md

Folders and files

Latest commit

History

Repository files navigation

Catholic Doctrinal Fidelity Index (CDFI) Framework

What This Is

What This Repository Is (and Is Not)

This is

This is not

The Core Sequence

Repository Structure

The Seven Source Publications

The CDFI Formula

The Five Failure Modes

Deployment Tiers

Reference Implementation: SAICRED v2

Judge Reliability Certification

Adapting This Framework for Other Traditions

Known Limitations

AI-Assisted Research Disclosure

Citation

License

Author

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages