Add bootstrap lonely_psu adjust, CV on estimates, weight trimming, and pretrends+survey#260
Add bootstrap lonely_psu adjust, CV on estimates, weight trimming, and pretrends+survey#260
Conversation
8d: Implement lonely_psu="adjust" for survey-aware bootstrap. Singleton PSUs are pooled into a combined pseudo-stratum for weight generation in both multiplier (generate_survey_multiplier_weights_batch) and Rao-Wu (generate_rao_wu_weights) paths. This matches the analytical TSL "adjust" behavior of centering around the global mean. 8e-i: Add coef_var property (SE / |estimate|) to all 12 results classes with CV display in summary() output. Returns NaN when estimate is 0 or SE is non-finite. Used by federal agencies (NCHS) for publication standards (CV < 30%). 8e-ii: Add trim_weights() utility to prep.py for capping extreme survey weights via absolute upper bound or quantile-based threshold. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Thread survey weights through _compute_lead_coefficients(): - Pass survey weights to _iterative_demean() for weighted demeaning - Pass survey weights to solve_ols() for WLS point estimates - Replace cluster-robust VCV with compute_survey_vcov() for design-based inference (strata/PSU/FPC) - Use survey df in safe_inference() for t-distribution critical values - Adjust F-test denominator df in _pretrend_test() to use survey df Persist resolved_survey and survey_weights in _fit_data dict so _pretrend_test() can access them post-fit. Subset resolved_survey to Omega_0 via dataclasses.replace() before passing to lead regression. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Highest unmitigated severity: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
P1: Use consistent full-design df_survey for both per-lead inference and F-test in ImputationDiD pretrends. Add explicit survey_df parameter to _compute_lead_coefficients() passed from the full resolved_survey. P1: Document bootstrap lonely_psu="adjust" pseudo-stratum pooling in REGISTRY.md with deviation note listing all affected estimators. P1: Add Rao-Wu lonely_psu="adjust" test (direct helper test). P2: Fix stale test_imputation_pretrends_survey_raises in test_pretrends_event_study.py — now expects success. P2: Update REGISTRY.md pretrends+survey note to reflect support. P3: Add trim_weights to __all__ in __init__.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker Highest unmitigated severity: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
Verification note: |
Replicate-weight survey designs need per-replicate lead regression refits for pretrends, which are not yet implemented. Reject with NotImplementedError for now — analytical survey designs (strata/PSU/FPC) remain supported. Fixes: replicate designs silently falling back to unweighted pretrends, negative-horizon keys contaminating the replicate refit vector, and compute_survey_vcov being called for replicate designs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker Highest unmitigated severity: Executive Summary
Methodology Affected methods:
Code Quality
Performance No material findings in the changed paths. Maintainability No material findings beyond the correctness issues above. Tech Debt No separate tech-debt finding. Security No findings. Documentation/Tests
Path to Approval
|
P0: Block replicate-weight survey designs in _pretrend_test() — the public method was still accessible after a replicate fit, routing through compute_survey_vcov (TSL) instead of replicate variance. P1: Use residuals from solve_ols (result[1]) instead of recomputing y_dm - X_dm @ coefficients in the survey VCV path. solve_ols already handles rank-deficient fits by rebuilding fitted values from kept columns only, so residuals are finite even with NaN coefficients. P2: Add regression test for pretrend_test() rejection on replicate fits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker Highest unmitigated severity: Executive Summary
Methodology Affected methods: Borusyak-Jaravel-Spiess Test 1 / Equation 9 pretrend regression and survey replication/bootstrap handling. In the paper, Test 1 is an OLS regression on untreated observations only, with joint significance tested via a heteroskedasticity- and cluster-robust Wald test; Rust & Rao is the cited general reference on jackknife/BRR/bootstrap replication methods for complex surveys. (academic.oup.com)
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
P0: Don't coerce nonpositive survey df_survey to 1 in pretrend F-test. Return NaN p-value when df_survey <= 0 (variance unidentified). P1: Reduce to kept (finite-coefficient) columns before calling compute_survey_vcov in _compute_lead_coefficients, then expand VCV back to full matrix with NaN for dropped columns. Prevents singular X'WX on rank-deficient lead/covariate designs. P2: Change coef_var guard from se > 0 to se >= 0 across all 12 results classes. SE=0 with finite nonzero estimate (e.g., FPC census) now correctly returns CV=0.0 instead of NaN. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Highest unmitigated severity: This re-review is materially better than the last one: the prior nonpositive- Executive Summary
Methodology Affected methods: Borusyak-Jaravel-Spiess Test 1 / Equation 9 pretrend regression, its new analytical-survey extension, and survey-aware bootstrap lonely-PSU handling. In the paper, Test 1 estimates
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
P1: Skip cluster_ids in solve_ols when survey VCV will replace the cluster-robust VCV anyway. Prevents errors on single-PSU untreated domains where cluster-robust VCV would fail before survey VCV runs. P2: Add test_coef_var_zero_se regression (SE=0 -> CV=0.0). Defer subpopulation-based Omega_0 survey design to TODO.md as medium priority — current physical subsetting is correct for the common case but can change lonely-PSU/FPC behavior when some PSU/stratum has no untreated observations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Needs changes Executive Summary
Methodology
Code Quality No material findings beyond the methodology issue above. Performance No material findings. Maintainability No separate maintainability finding, but the duplicated survey-subsetting blocks in diff_diff/imputation.py:L1736 and diff_diff/imputation.py:L2271 should be centralized once the P1 is fixed. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
Replace physical Omega_0 subsetting with zero-padded full-design approach: scores and residuals from the lead regression are zero-padded back to full-panel length before calling compute_survey_vcov(), so the full PSU/strata structure is preserved for design-based variance. This follows the standard survey subpopulation convention (Lumley 2004, R survey::subset.survey.design) where out-of-domain observations contribute zero to the estimating equation but retain their design information for correct variance estimation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
Use np.nanquantile instead of np.quantile so NaN weights don't poison the quantile computation and corrupt the entire output column. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings remain; the highest remaining severity is P2. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
P2: Add test_pretrends_survey_always_treated_psu — regression test with a PSU/stratum that has mostly treated obs (sparse untreated domain). Exercises the zero-padding subpopulation path. P3: Update docstrings in _compute_lead_coefficients, _pretrend_test, pretrend_test (results), and both bootstrap helper functions to describe survey VCV support and pooled-singleton adjust behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment One unmitigated P1 remains in the new Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. The earlier survey-pretrend TODO is appropriately marked resolved in TODO.md:57. Security No findings. Documentation/Tests
Path to Approval
|
P1: Emit UserWarning when lonely_psu="adjust" has only 1 singleton stratum total in both multiplier and Rao-Wu bootstrap. Singleton gets zero variance (same as remove) since pooling requires 2+ singletons. Matches R survey package behavior. P2: Fix test_pretrends_survey_always_treated_psu to use first_treat=1 so stratum 0 has genuinely zero untreated rows, exercising the zero-padding branch. P2: Add single-singleton warning tests for both bootstrap families. P3: Update REGISTRY bullets at lines 860/863 to mention design-based survey VCV alongside cluster-robust. Document single-singleton bootstrap fallback behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance No findings. Maintainability No findings. Tech Debt No findings. The resolved survey-pretrend entry added in TODO.md:L57 is appropriate. Security No findings. Documentation/Tests No findings. The previously open coverage gaps called out in the prior review appear addressed by the new one-singleton bootstrap tests in tests/test_survey_phase8.py:L917-L983 and the zero-untreated-domain survey-pretrend test in tests/test_survey_phase8.py:L1278-L1329. |
P2: Validate lower <= upper in trim_weights after quantile resolution. Raise ValueError when lower exceeds upper to prevent silent corruption. P3: Reword single-singleton bootstrap fallback as library-specific documented behavior (not R equivalence — R's analytical adjust uses grand-mean centering, but the bootstrap single-singleton analogue is not defined in the literature). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
P2: Validate upper/lower are finite and non-negative in trim_weights. Reject NaN/negative/inf caps before applying np.minimum/np.maximum. P3: Add survey+covariates pretrend test and survey+bootstrap pretrend test exercising the distinct code paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. The previously deferred survey-pretrends item is explicitly marked resolved in TODO.md:L57. Security No findings. Documentation/Tests
|
Summary
lonely_psu="adjust"for survey-aware bootstrap by pooling singleton PSUs into a combined pseudo-stratum (multiplier and Rao-Wu paths)coef_varproperty (SE / |estimate|) to all 12 results classes with CV display insummary()outputtrim_weights()utility toprep.pyfor capping extreme survey weights (absolute or quantile-based)Methodology references (required if estimator / math changes)
Validation
tests/test_survey_phase8.py(16 new tests: 3 bootstrap adjust, 5 CV, 5 trim_weights, 3 pretrends+survey)Security / privacy
Generated with Claude Code