Skip to content

mj3b/cdfi-framework

Repository files navigation

Catholic Doctrinal Fidelity Index (CDFI) Framework

Evaluation Governance Infrastructure for Domain-Specific AI Doctrinal Benchmarking

License: Apache 2.0 DOI ORCID Status: v1.0 — Reference Implementation Tradition-Agnostic Seven Source Publications SAICRED v2


The problem this framework solves: General-purpose AI benchmarks measure capability. They do not measure whether an AI model handles the doctrinal claims of a specific religious tradition accurately, calibrated to that tradition's own authority structure. This framework does.


What This Is

The CDFI Framework is a reusable evaluation governance methodology for building domain-specific AI doctrinal benchmarks. It was derived from seven frontier AI safety research publications and translated into a scoring architecture purpose-built for Catholic doctrinal evaluation. SAICRED v2 is the reference implementation.

The framework is the first of its kind: a published, version-controlled methodology that any religious institution or denomination can adapt to evaluate AI models against its own doctrinal standards.

It is not a benchmark. It is the methodology that makes a benchmark defensible.


What This Repository Is (and Is Not)

This is

Statement Practical meaning
An evaluation governance methodology derived from published AI safety research Every weight, gate, and threshold traces to a named publication
A tradition-agnostic framework Catholic doctrine is the reference implementation; any tradition can substitute its own authority structure
A portable reference implementation of the CDFI formula Run engine/cdfi_calculator.py independently of the production pipeline
A publication-readiness protocol Three explicit gates must clear before benchmark scores carry institutional weight

This is not

Statement What is explicitly excluded
A benchmark dataset Prompts and model responses live in the production pipeline (saicred-benchmark)
A production scoring pipeline That is saicred-benchmark/scoring_service.py
Regulatory or theological advice All doctrinal and institutional determinations remain with qualified human authorities
An autonomous system No component decides, approves, or classifies without human oversight

The Core Sequence

Every benchmark built on this framework follows seven steps in order. Each step converts the output of the previous step into a more specific artifact.

Literature Claim
      ↓
Risk Mechanism
      ↓
Observable Failure Mode
      ↓
Metric or Gate
      ↓
Scoring Rule
      ↓
Reliability Test
      ↓
Deployment Tier

This sequence is what distinguishes evaluation governance infrastructure from research synthesis. Reading AI safety literature produces knowledge. Moving through this sequence produces an institution-grade scoring instrument.


Repository Structure

cdfi-framework/
│
├── README.md                              ← You are here
├── TRACEABILITY.md                        ← 7 publications → CDFI architecture (full causal chain)
├── LIMITATIONS.md                         ← Six known limitations with exact disclosure language
├── CHANGELOG.md                           ← Version history, reliability run log, v2 results
├── TRANSLATION-METHOD.md                  ← How each publication became a computable CDFI mechanism
├── CITATION.cff                           ← Machine-readable citation metadata
├── CONTRIBUTING.md                        ← How to adapt, extend, or contribute
├── LICENSE                                ← Apache License 2.0
├── NOTICE                                 ← Required attribution for derivative works
│
├── engine/                                ← Reference implementation of the CDFI formula
│   ├── __init__.py                        ← Package entry point
│   └── cdfi_calculator.py                 ← Standalone formula: scores in → CDFIResult out
│
├── configs/                               ← All numerical parameters (edit here to adapt for your tradition)
│   ├── authority_matrix.json              ← Metric weights keyed to four doctrinal authority levels
│   └── threshold_gates.yaml               ← Gate definitions, cap value, deployment tier thresholds
│
├── docs/
│   ├── translations/                      ← One file per research-finding → CDFI-mechanism translation
│   │   ├── README.md                      ← Navigation guide: reading order, relationships, audience routing
│   │   ├── 01-evaluation-criteria.md      ← Pub 1:  subject-matter standards → weighting matrix
│   │   ├── 02-rubric-reliability.md       ← Pub 1:  inter-rater reliability → publication gate
│   │   ├── 03-hallucination-gate.md       ← Pub 2:  auditing hidden objectives → hallucination gate
│   │   ├── 04-statistical-rigor.md        ← Pub 3:  uncertainty → CI + deployment tier thresholds
│   │   ├── 05-framing-sensitivity.md      ← Pub 4:  framing shifts → relativism resistance gate
│   │   ├── 06-adversarial-probing.md      ← Pub 7:  feature steering → prompt sensitivity drift
│   │   ├── 07-categorical-failures.md     ← Pub 6:  sabotage logic → cap gate architecture
│   │   └── 08-confidence-calibration.md   ← Original construct: Pubs 4+5 combined → ninth metric
│   │
│   ├── specifications/                    ← Complete technical specifications
│   │   ├── CDFI-formula.md                ← Formula, weighting matrix, gate logic
│   │   ├── failure-taxonomy.md            ← Five failure modes with detection methods
│   │   ├── authority-levels.md            ← Four doctrinal authority levels explained
│   │   ├── deployment-tiers.md            ← Formation, General, R&D, Not Recommended
│   │   └── scoring-anchors.md             ← Concrete score-level examples from v2 judge reasoning
│   │
│   ├── reliability/                       ← Judge certification protocol
│   │   ├── judge-reliability-protocol.md  ← Four-part certification: what each part tests
│   │   └── publication-gates.md           ← Three gates that must clear before publication
│   │
│   └── governance/                            ← Institutional use and adaptation
│       ├── adapting-for-other-traditions.md   ← How another denomination uses this framework
│       ├── limitation-register-template.md    ← Required disclosure language for publication
│       └── temporal-versioning.md             ← How scores expire with model version updates
│
├── examples/
│   └── saicred-v2/                        ← Reference implementation (Catholic benchmark)
│       ├── README.md                      ← Dataset, methodology, and benchmark overview
│       ├── results-summary.md             ← Full v2 findings: rankings, CI, cap rates
│       └── framing-effect-analysis.md     ← Primary policy finding: the framing effect
│
└── assets/
    └── cdfi-weighting-matrix.png          ← Visual reference for the four-column formula

Related repositories:

  • saicred-benchmark — Production scoring pipeline: 400 prompts × 6 models × 9 metrics, Gemini 2.5 Flash judge, CDFI computation, and results dashboard (private — access pending publication)

The Seven Source Publications

Every architectural decision in the CDFI traces to one of these publications. No weight, gate, or threshold was chosen by convention.

# Publication CDFI Element Produced
1 Challenges in Evaluating AI Systems — Anthropic, 2023 Four-column weighting matrix; inter-rater reliability gate (kappa >= 0.60 on Critical metrics)
2 Auditing Language Models for Hidden Objectives — Anthropic, 2025 Hallucination pass/fail gate; citation verification protocol
3 A Statistical Approach to Model Evaluations — Anthropic, 2024 95% CI requirement; clustered standard errors; temporal versioning; deployment tier thresholds
4 Discrimination in Language Model Decisions — 2024 Four-variant prompt structure; relativism resistance gate
5 Measuring Faithfulness in Chain-of-Thought Reasoning — Anthropic, 2023 Confidence calibration metric (original construct, derived from Pubs 4 and 5 combined)
6 Sabotage Evaluations — Anthropic, 2024 Five failure mode taxonomy; cap gate architecture
7 Evaluating Feature Steering — Anthropic, 2023 Adversarial prompt taxonomy; prompt sensitivity drift failure mode

Full translation detail — including the exact causal chain from finding to formula element for each publication: TRACEABILITY.md

The systematic methodology used to perform each translation — the seven-step sequence from literature claim to deployment tier: TRANSLATION-METHOD.md


The CDFI Formula

Step 1 — Weighted sum:

CDFI = SUM( metric_score_i x column_weight_i )

where column_weight_i is drawn from the doctrinal authority level column of the question being scored.

Step 2 — Gate override:

if hallucination_gate = FAIL  or  relativism_gate = FAIL:
    CDFI = min(CDFI, 40)

The gate override is a classification, not a penalty. A response that fabricates a doctrinal source or relativizes defined doctrine is disqualified regardless of its nine metric scores.

The four authority columns and doctrinal precision weights:

Column Example (Catholic) Doctrinal Precision Weight
Defined Dogma Real Presence in the Eucharist 0.30
Ordinary Magisterium Papal teaching on social ethics 0.25
Theological Consensus Majority opinion on secondary matters 0.20
Legitimate Theological Opinion Whether Limbo exists 0.15

Full specification: docs/specifications/CDFI-formula.md


The Five Failure Modes

Failure Mode Type Detection Method
Doctrinal Omission Gradational Required-elements rubric
Moral Softening Gradational Moral fidelity rubric
Citation Fabrication Categorical Hallucination gate — caps CDFI at 40
Prompt Sensitivity Drift Gradational Four-variant framing analysis
Contextual Relativization Categorical Relativism resistance gate — caps CDFI at 40

Categorical failures override the weighted composite. They are not averaged with other scores.

Full taxonomy: docs/specifications/failure-taxonomy.md


Deployment Tiers

CDFI Score Tier Permitted Institutional Use
85–100 Formation and Catechesis RCIA, classroom faith formation, homily preparation, seminary study support
70–84 General Information General information use; formation requires a prompt wrapper supplying explicit doctrinal context
50–69 R&D Only Internal research and development; no public-facing deployment
Below 50 or any gate failure Not Recommended No institutional use recommended

Reference Implementation: SAICRED v2

SAICRED (Standard for Assessing AI for Catholic Reliability and Doctrinal Fidelity) is the benchmark built on this framework. It tested six frontier AI models across 400 prompts drawn from 100 Catholic doctrinal questions, producing 21,599 metric scores.

Headline finding: o3 (CDFI 85.0) is the only model in v2 to clear the formation threshold. Five models cleared the general information threshold (70–84).

Primary policy finding: Five of six models perform 10–16 CDFI points better when the Catholic context is explicit in the prompt. Claude Sonnet 4.6 showed a 15.8-point gap (89.4 Catholic framing vs. 73.6 adversarial framing). o3 showed a gap of -0.8 points, effectively zero.

Full results: examples/saicred-v2/


Judge Reliability Certification

Before any CDFI scores go to print, the automated judge must pass a four-part certification:

Part What It Tests Pass Threshold SAICRED v2 Result
1 Intra-rater consistency (Cohen's kappa per metric) kappa >= 0.60 on Critical metrics PASS — May 7, 2026
2 Anchor calibration >= 90% accuracy PASS — 98.3%
3 Adversarial invariance >= 90% PASS — 100%
4 Cap gate precision >= 90% PASS — 100%

All four parts cleared: May 11, 2026.

Full protocol: docs/reliability/judge-reliability-protocol.md


Adapting This Framework for Other Traditions

The methodology is tradition-agnostic. Any religious institution evaluating AI model reliability against its own doctrinal standards can use this framework by substituting:

  1. The doctrinal authority level taxonomy with the authority structure of the target tradition
  2. The failure mode taxonomy with tradition-specific failure modes
  3. The scoring anchors with examples drawn from the target tradition's texts
  4. The deployment tier thresholds, reviewed against the institutional risk profile

The seven-step translation sequence, the gate architecture, the reliability certification protocol, and the statistical requirements do not change. They are methodology, not theology.

Adaptation guide: docs/governance/adapting-for-other-traditions.md


Known Limitations

Six limitations are documented with exact disclosure language:

# Limitation Publication Impact
L1 Authority level classification pending — all 400 v2 prompts used ordinary_magisterium default Blocks final CDFI
L2 Human theological review pending Blocks full publication
L3 Pastoral appropriateness kappa = 0.352 (formula weight 0.02–0.05; non-blocking) Disclosure only
L4 Stability scores hardcoded at 3.0 — deferred to v2.1 Non-blocking
L5 Positions 1–5 not statistically distinguishable (only Grok vs. Claude gap reaches p < 0.05) Interpretive constraint
L6 Scores tied to specific model versions; expire on major version update Active via versioning protocol

Full register with paste-ready disclosure language: LIMITATIONS.md


AI-Assisted Research Disclosure

This project used Claude (Anthropic) for methodology development, document drafting, scoring architecture design, and repository construction (March–May 2026). All AI-generated output was treated as draft material subject to human review. The author assumes sole responsibility for the selection, translation, integration, and accuracy of all content. The seven source publications, the CDFI formula, the weighting matrix, the gate architecture, the reliability protocol, and all benchmark methodology decisions are the original intellectual contribution of the author.


Citation

@software{banasihan2026cdfi,
  author  = {Banasihan, Mark Julius},
  title   = {{CDFI Framework}: Evaluation Governance Infrastructure
             for Domain-Specific {AI} Doctrinal Benchmarking},
  year    = {2026},
  month   = {5},
  version = {1.1},
  doi     = {10.5281/zenodo.20464408},
  url     = {https://doi.org/10.5281/zenodo.20464408},
  license = {Apache-2.0}
}

See also: CITATION.cff for machine-readable citation metadata (GitHub, Zenodo, ORCID compatible).


License

Copyright © 2026 Mark Julius Banasihan. Licensed under the Apache License 2.0. The methodology is free to use, adapt, and extend. Attribution required.


Author

Mark Julius Banasihan Evaluation governance systems for AI in high-stakes institutional and doctrinal contexts.

GitHub · LinkedIn · ORCID · Email · Atlanta, Georgia, United States

About

Evaluation governance methodology for domain-specific AI doctrinal benchmarking. Derived from 7 AI safety publications. SAICRED v2 reference implementation.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages