Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 5 additions & 12 deletions .github/workflows/ai_pr_review.yml
Original file line number Diff line number Diff line change
Expand Up @@ -147,20 +147,13 @@ jobs:
echo "Changed files:"
git --no-pager diff --name-status "$BASE_SHA" "$HEAD_SHA"
echo ""
# Identify large data files excluded from the unified diff
EXCLUDED_GLOBS="benchmarks/data/real/*.json benchmarks/data/real/*.csv"
excluded_files=$(git --no-pager diff --name-only "$BASE_SHA" "$HEAD_SHA" -- $EXCLUDED_GLOBS)
if [ -n "$excluded_files" ]; then
echo "NOTE: The following files are excluded from the unified diff below"
echo "due to size (they are generated data/golden-value files). Their"
echo "filenames appear in the 'Changed files' list above, but their"
echo "content is NOT shown. Review coverage for these files is metadata-only."
echo "$excluded_files" | sed 's/^/ - /'
echo ""
fi
echo "Unified diff (context=5):"
# Exclude large generated/data files from the full diff to stay
# within the model's input limit. The --name-status above still
# lists them. Narrowed to real-data assets and notebook outputs.
git --no-pager diff --unified=5 "$BASE_SHA" "$HEAD_SHA" \
-- . ':!benchmarks/data/real/*.json' ':!benchmarks/data/real/*.csv'
-- . ':!benchmarks/data/real/*.json' ':!benchmarks/data/real/*.csv' \
':!docs/tutorials/*.ipynb'
} >> "$PROMPT"

- name: Run Codex
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ Thumbs.db

# Benchmarks - generated data and results (can be regenerated)
benchmarks/data/synthetic/*.csv
benchmarks/data/real/raw/
benchmarks/results/

# Rust build artifacts
Expand Down
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added
- **Survey real-data validation** (Phase 9) — 15 cross-validation tests against R's `survey` package using three real federal survey datasets:
- **API** (R `survey` package): TSL variance with strata, FPC, subpopulations, covariates, and Fay's BRR replicates
- **NHANES** (CDC/NCHS): TSL variance with strata + PSU + nest=TRUE, validating the ACA young adult coverage provision DiD
- **RECS 2020** (U.S. EIA): JK1 replicate weight variance with 60 pre-computed replicate columns
- ATT, SE, df, and CI match R to machine precision (< 1e-10) where directly comparable; known deviations documented in REGISTRY.md (TWFE SE differs due to unit FE absorption; subpopulation df differs due to strata preservation)

## [2.8.4] - 2026-04-04

### Added
Expand Down
Loading
Loading