Skip to content

Commit f034928

Browse files
authored
Merge pull request #492 from igerber/feature/imputation-did-vcov-type-phase1b
ImputationDiD: thread vcov_type as narrow {hc1} contract per BJS Theorem 3 (Phase 1b interstitial #3)
2 parents 75e98e9 + 098dc7c commit f034928

8 files changed

Lines changed: 1075 additions & 26 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Large diffs are not rendered by default.

TODO.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,13 +104,14 @@ Deferred items from PR reviews that were not addressed before merge.
104104
| PreTrendsPower: CS/SA `anticipation=1` R-parity fixture. The PR-C R-parity goldens cover NIS power + γ_p MDV at `atol=1e-4` on four shifted-grid / regular / irregular / K=1 fixtures, but R `pretrends` has no anticipation parameter so the Python-side `_extract_pre_period_params` anticipation filter (`if t < _pre_cutoff` in `pretrends.py` lines 1138-1150 for CS; mirror in SA branch) is not R-parity-locked. Build a synthetic `CallawaySantAnnaResults` (or `SunAbrahamResults`) with `anticipation=1` and a t=-1 event-study entry that should be filtered before reaching `_compute_power_nis`, then assert the resulting γ_p matches R's `slope_for_power()` on the K=4 shifted-grid fixture. Existing PR-B MC-based tests (`TestPretrendsPropositions`) and full-VCV tests (`TestPretrendsCovarianceSource`) already cover the filter mechanically; this would close the loop against R. | `tests/test_methodology_pretrends.py::TestPretrendsParityR`, `benchmarks/R/generate_pretrends_golden.R` | PR-C follow-up | Low |
105105

106106

107-
| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the standalone estimators that expose `cluster=` but not yet `vcov_type=`: `ImputationDiD`, `TwoStageDiD`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham`; PR 2/8 added `StackedDiD`; PR 3/8 added `WooldridgeDiD` OLS path. **Two interstitial PRs (post-PR-3/8) addressed the IF-based estimators separately, each permanently narrow to `{"hc1"}`**: (a) `CallawaySantAnna` per Callaway & Sant'Anna (2021) Theorem 2 (also fixed CS's bare-`cluster=` silent no-op); (b) `TripleDifference` per Ortiz-Villavicencio & Sant'Anna (2025) on the 3-pairwise-DiD decomposition. Analytical-sandwich families don't compose with IF-based variance for either. This row tracks the remaining 3 (`ImputationDiD` and `EfficientDiD` are also IF-based and will likely adopt the same narrow contract; `TwoStageDiD` is sandwich-class). | multiple | Phase 1b | Medium |
107+
| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the standalone estimators that expose `cluster=` but not yet `vcov_type=`: `TwoStageDiD`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham`; PR 2/8 added `StackedDiD`; PR 3/8 added `WooldridgeDiD` OLS path. **Three interstitial PRs (post-PR-3/8) addressed the IF-based estimators separately, each permanently narrow to `{"hc1"}`**: (a) `CallawaySantAnna` per Callaway & Sant'Anna (2021) Theorem 2 (also fixed CS's bare-`cluster=` silent no-op); (b) `TripleDifference` per Ortiz-Villavicencio & Sant'Anna (2025) on the 3-pairwise-DiD decomposition; (c) `ImputationDiD` per Borusyak-Jaravel-Spiess (2024) Theorem 3 on per-unit IF aggregation (also added defensive `n_clusters<2`/`n_psu<2` NaN guard on the bootstrap path + `cluster=` + replicate-weights `NotImplementedError`). Analytical-sandwich families don't compose with IF-based variance for any of the three. This row tracks the remaining 2 (`EfficientDiD` is also IF-based and will likely adopt the same narrow contract; `TwoStageDiD` is sandwich-class). | multiple | Phase 1b | Medium |
108108
| Extend `SunAbraham` with `vcov_type="conley"` (Conley spatial-HAC) as a first-class feature: thread `conley_coords` / `conley_cutoff_km` / `conley_metric` / `conley_kernel` / `conley_time` / `conley_unit` / `conley_lag_cutoff` through `_fit_saturated_regression`. Phase 1b PR 1/8 deferred this; SA currently rejects `vcov_type="conley"` at `__init__` with a deferral message. | `diff_diff/sun_abraham.py` | follow-up | Medium |
109109
| Extend `StackedDiD` with `vcov_type="conley"` (Conley spatial-HAC) — thread the six `conley_*` params through `solve_ols` at `stacked_did.py:419` (and the `_refit_stacked` closure at `:444`). Phase 1b PR 2/8 deferred this; StackedDiD currently rejects `vcov_type="conley"` at `__init__` with a deferral message. Same shape as the SunAbraham conley follow-up. | `diff_diff/stacked_did.py` | follow-up | Medium |
110110
| Extend `WooldridgeDiD` with `vcov_type="conley"` — thread the six `conley_*` params through `solve_ols` in `_fit_ols`. Phase 1b PR 3/8 deferred this; WooldridgeDiD currently rejects `vcov_type="conley"` at `__init__` with a deferral message. Same shape as the SunAbraham / StackedDiD conley follow-ups. | `diff_diff/wooldridge.py` | follow-up | Medium |
111111
| Extend `WooldridgeDiD` `method ∈ {"logit","poisson"}` paths with `vcov_type ∈ {classical, hc2, hc2_bm}`. The GLM QMLE sandwich uses pseudo-residuals (`weights=p(1-p)` for logit, `weights=μ_i` for Poisson, aweight semantics); composing HC2 leverage and Bell-McCaffrey Satterthwaite DOF with QMLE on canonical-link pseudo-residuals needs derivation + R parity against `clubSandwich::vcovCR(glm(...), type="CR2")`. Phase 1b PR 3/8 rejects `method != "ols" + vcov_type != "hc1"` at `__init__` with a deferral pointer here. | `diff_diff/wooldridge.py` (`_fit_logit`, `_fit_poisson`) | follow-up | Medium |
112112
| Extend `CallawaySantAnna` with `vcov_type="conley"` — would require deriving a spatial-HAC composition for per-unit influence functions (Conley 1999 spatial kernel × per-(g,t) IF aggregation); no reference implementation exists today. Phase 1b interstitial PR rejected this at `__init__` with a deferral pointer here. | `diff_diff/staggered.py` | follow-up | Low |
113113
| Extend `TripleDifference` with `vcov_type="conley"` — would require deriving a spatial-HAC composition for the 3-pairwise-DiD influence-function decomposition (Conley 1999 spatial kernel × `inf = w3·IF_3 + w2·IF_2 - w1·IF_1` aggregation); no reference implementation exists today. Phase 1b interstitial #2 PR rejected this at `__init__` with a deferral pointer here. | `diff_diff/triple_diff.py` | follow-up | Low |
114+
| Extend `ImputationDiD` with `vcov_type="conley"` — would require deriving a spatial-HAC composition with the Theorem 3 per-unit IF aggregation (Conley 1999 spatial kernel × `sigma_sq = (cluster_psi_sums**2).sum()` reduction); no reference implementation exists today. Phase 1b interstitial #3 PR rejected this at `__init__` with a deferral pointer here. | `diff_diff/imputation.py` | follow-up | Low |
114115
| Decide whether to formally deprecate `CallawaySantAnna.cluster=X` in favor of `survey_design=SurveyDesign(psu=X)`. Both APIs are first-class today (the bare-cluster path synthesizes a minimal SurveyDesign internally), but having two equivalent paths to express the same intent creates redundant surface. Mirrors a similar question for ImputationDiD / EfficientDiD / TwoStageDiD if those estimators ever face the same review. | `diff_diff/staggered.py` | follow-up | Low |
115116
| Harmonize SunAbraham's HC1 within-transform finite-sample correction with `fixest::sunab()`. SA's `solve_ols` applies `n / (n - k_dm)` (within-transform columns only); fixest applies `n / (n - k_total)` (counts absorbed FE). SE values differ by ~1-2% on typical panel sizes (documented in REGISTRY.md "Deviation from R"; pinned at `atol=5e-3` in `tests/test_methodology_sun_abraham.py`). Either thread `df_adjustment` into the vcov scaling or document as an intentional difference. | `diff_diff/sun_abraham.py`, `diff_diff/linalg.py::compute_robust_vcov` | follow-up | Low |
116117
<!-- Rows 104-105 LIFTED 2026-05-20 via the clubSandwich WLS-CR2 port. The diff-diff
@@ -202,7 +203,7 @@ Ordered paydown view across the tables above. Tier A → D is by effort × risk,
202203

203204
#### Tier B — Mid-size methodology (5-10 CI rounds expected, per memory cascade priors)
204205

205-
- Thread `vcov_type` through the 3 remaining standalone estimators: `ImputationDiD`, `TwoStageDiD`, `EfficientDiD` (Phase 1b PR 1/8 added SunAbraham, PR 2/8 added StackedDiD, PR 3/8 added WooldridgeDiD-OLS; interstitial #1 narrowed CallawaySantAnna permanently to `{hc1}` per IF-based variance + fixed bare-`cluster=` silent no-op; interstitial #2 narrowed TripleDifference permanently to `{hc1}` per IF-based variance on the 3-pairwise-DiD decomposition)
206+
- Thread `vcov_type` through the 2 remaining standalone estimators: `TwoStageDiD`, `EfficientDiD` (Phase 1b PR 1/8 added SunAbraham, PR 2/8 added StackedDiD, PR 3/8 added WooldridgeDiD-OLS; interstitial #1 narrowed CallawaySantAnna permanently to `{hc1}` per IF-based variance + fixed bare-`cluster=` silent no-op; interstitial #2 narrowed TripleDifference permanently to `{hc1}` per IF-based variance on the 3-pairwise-DiD decomposition; interstitial #3 narrowed ImputationDiD permanently to `{hc1}` per IF-based variance on Theorem 3 per-unit IF aggregation + defensive bootstrap n_psu<2/n_clusters<2 NaN guard)
206207
- SyntheticDiD: rename internal `placebo_effects``variance_effects` AND public `placebo_effects` field with deprecation alias retained for one release (`synthetic_did.py`, `results.py`)
207208
- StaggeredTripleDifference R parity: commit CSV fixtures + add covariate-adjusted scenarios + aggregation-SE assertions (`tests/test_methodology_staggered_triple_diff.py`, `benchmarks/R/benchmark_staggered_triplediff.R`)
208209
- StaggeredTripleDifference: per-cohort group-effect SE WIF override for exact R `triplediff` match (`staggered_triple_diff.py`)

diff_diff/guides/llms-full.txt

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -379,12 +379,14 @@ ImputationDiD(
379379
anticipation: int = 0,
380380
alpha: float = 0.05,
381381
cluster: str | None = None, # Defaults to unit-level clustering
382+
vcov_type: str = "hc1", # {"hc1"} only — IF-based variance per Borusyak et al. (2024) Theorem 3
382383
n_bootstrap: int = 0, # 0 = analytical (Theorem 3 variance)
383384
bootstrap_weights: str = "rademacher", # "rademacher", "mammen", or "webb"
384385
seed: int | None = None,
385386
rank_deficient_action: str = "warn",
386387
horizon_max: int | None = None, # Max event-study horizon
387388
aux_partition: str = "cohort_horizon", # "cohort_horizon", "cohort", or "horizon"
389+
pretrends: bool = False, # Include pre-treatment horizons in event study
388390
)
389391
```
390392

@@ -402,6 +404,7 @@ imp.fit(
402404
covariates: list[str] = None,
403405
aggregate: str = None, # None, "simple", "event_study", "group", or "all"
404406
balance_e: int = None,
407+
survey_design: SurveyDesign = None, # Optional design-based inference (pweight + analytical strata/PSU/FPC or replicate BRR/Fay/JK1/JKn/SDR)
405408
) -> ImputationDiDResults
406409
```
407410

@@ -1278,7 +1281,15 @@ ImputationDiDResults, TwoStageDiDResults, StackedDiDResults, and EfficientDiDRes
12781281

12791282
Each event study effect dict contains: `effect`, `se`, `t_stat`, `p_value`, `conf_int`, `n_obs` (or `n_groups`).
12801283

1281-
**Methods:** `summary()`, `print_summary()`, `to_dataframe()`
1284+
**Variance metadata** (`ImputationDiDResults` carries these; other staggered Results may surface a subset):
1285+
1286+
| Attribute | Type | Description |
1287+
|-----------|------|-------------|
1288+
| `vcov_type` | `str` | Variance estimator family (`"hc1"` for IF-based estimators; permanently narrow on `ImputationDiD` / `CallawaySantAnna` / `TripleDifference` per IF-vs-sandwich taxonomy) |
1289+
| `cluster_name` | `str | None` | Effective cluster column name (e.g. `"unit"` for default `cluster=None`); `None` under survey designs (the survey block already names PSU/strata) |
1290+
| `n_clusters` | `int | None` | Number of effective clusters; `None` under survey designs |
1291+
1292+
**Methods:** `summary()`, `print_summary()`, `to_dataframe()`, `to_dict()` (flat dict of headline aliases + `vcov_type` + conditional `cluster_name`/`n_clusters`/`n_bootstrap`/`inference_method`)
12821293

12831294
### ContinuousDiDResults
12841295

0 commit comments

Comments
 (0)