Skip to content

Add survey_design support to WooldridgeDiD (Phase 10f)#280

Open
igerber wants to merge 6 commits intomainfrom
wooldridge-survey
Open

Add survey_design support to WooldridgeDiD (Phase 10f)#280
igerber wants to merge 6 commits intomainfrom
wooldridge-survey

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 7, 2026

Summary

  • Add survey_design parameter to WooldridgeDiD.fit() for all three estimation paths (OLS, logit, Poisson)
  • Add weights parameter to solve_poisson() in linalg.py, mirroring existing solve_logit() pattern
  • OLS path: survey-weighted within-transformation + WLS + TSL vcov via compute_survey_vcov()
  • Logit/Poisson paths: survey-weighted IRLS + X_tilde linearization trick for correct QMLE sandwich vcov
  • Add survey_metadata and _df_survey to WooldridgeDiDResults with summary display and t-distribution inference in aggregate()
  • Add df parameter to _compute_weighted_agg() for survey degrees of freedom
  • Reject replicate-weight designs (NotImplementedError) and bootstrap + survey (ValueError)

Methodology references (required if estimator / math changes)

  • Method name(s): WooldridgeDiD (ETWFE) — Wooldridge (2023, 2025)
  • Paper / source link(s): Wooldridge (2023) "Simple approaches to nonlinear DiD"; Binder (1983) for TSL variance
  • Any intentional deviations from the source (and why): X_tilde linearization trick for nonlinear survey vcov (standard Binder/Lumley approach — X_tilde = X * sqrt(V), r_tilde = resids / sqrt(V) reuses existing compute_survey_vcov() without modification)

Validation

  • Tests added/updated: tests/test_wooldridge.py — 8 new tests in TestWooldridgeSurvey (OLS/logit/Poisson smoke, SE divergence, bootstrap rejection, weights-only, metadata, replicate rejection)
  • All 93 tests pass (85 existing + 8 new)

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Add survey design support to WooldridgeDiD for all three estimation
paths (OLS, logit, Poisson), completing Phase 10f of the survey
roadmap. OLS uses survey-weighted within-transformation + WLS + TSL
vcov. Logit and Poisson use survey-weighted IRLS + X_tilde
linearization trick to reuse compute_survey_vcov() for correct QMLE
sandwich. Replicate-weight designs raise NotImplementedError;
bootstrap + survey is rejected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Overall Assessment

⚠️ Needs changes — two unmitigated P1 issues in the new survey path change inference behavior without being documented or tested.

Executive Summary

  • P1: the new survey code path does not propagate cluster into Taylor-linearized survey variance, so survey_design=SurveyDesign(weights=...) plus cluster=... silently uses per-observation PSUs instead of the requested cluster structure.
  • P1: survey nonlinear aggregation switched from documented ETWFE cell-count weights to survey-weight sums, while survey OLS still uses raw counts. That is an undocumented estimand change and makes aggregates depend on method.
  • The X_tilde/linearization approach itself is not the blocker here; weighted WLS/IRLS plus Binder/Lumley-style TSL is a plausible implementation of the cited methodology. citeturn3search0turn1search0turn2search0turn2search1
  • The new tests are smoke/rejection/metadata checks only; they do not exercise either of the two contracts above.
  • docs/choosing_estimator.rst is now internally contradictory: it still says Wooldridge survey support is “planned for Phase 10f,” while the row claims support shipped and incorrectly advertises survey bootstrap support.

Methodology

Cross-checking against the cited Wooldridge nonlinear ETWFE framing and the Binder/Lumley survey-linearization references, the core modeling choice here is defensible. The remaining problems are not with X_tilde; they are with how the new survey machinery is wired into this library’s documented ETWFE and survey contracts. citeturn3search0turn1search0turn2search0turn2search1

  • P1 The survey variance path ignores cluster, even though the project’s survey contract says weights-only designs with cluster= must inject those clusters as effective PSUs. In all three branches, cluster_ids are built but the survey vcov call goes straight to compute_survey_vcov(..., resolved) without _resolve_effective_cluster() / _inject_cluster_as_psu(); see diff_diff/wooldridge.py:L617-L635, diff_diff/wooldridge.py:L803-L842, diff_diff/wooldridge.py:L1044-L1077. The documented contract is explicit at docs/methodology/REGISTRY.md:L2297-L2303, and the helper exists at diff_diff/survey.py:L1103-L1157. Impact: for SurveyDesign(weights=...) or stratified-no-PSU designs, SEs, t-stats, CIs, and df_survey are wrong whenever the caller supplies cluster=.... Concrete fix: in each _fit_* survey branch, resolve the effective cluster, inject it as PSU when the design has no explicit PSU, let explicit PSU override conflicting cluster with the standard warning, and recompute survey_metadata / df_inf from the injected design.
  • P1 The PR changes nonlinear survey aggregates from documented n_{g,t} cell-count weights to survey-weight sums, but only for logit/Poisson. The ETWFE registry still documents simple/group/calendar/event aggregation using cell counts docs/methodology/REGISTRY.md:L1123-L1129, and the OLS path still stores counts diff_diff/wooldridge.py:L646-L656. The new survey nonlinear branches switch to sum(survey_weights[cell_mask]) diff_diff/wooldridge.py:L931-L936, diff_diff/wooldridge.py:L1175-L1179, and aggregate() then reuses _gt_weights for all downstream summaries diff_diff/wooldridge_results.py:L87-L166. Impact: survey OLS and survey nonlinear methods no longer target the same documented aggregation rule, and the nonlinear change is undocumented in REGISTRY.md. Concrete fix: either revert nonlinear gt_weights to raw cell counts to preserve the documented jwdid_estat contract, or intentionally switch all Wooldridge methods to survey-weighted cell masses and add a REGISTRY.md note plus regression tests for simple, group, calendar, and event.

Code Quality

  • No separate finding beyond the two correctness issues above.

Performance

  • No material performance finding in the changed code.

Maintainability

  • P3 Survey setup is copy-pasted across _fit_ols, _fit_logit, and _fit_poisson instead of going through one shared helper; see diff_diff/wooldridge.py:L584-L594, diff_diff/wooldridge.py:L774-L785, diff_diff/wooldridge.py:L1012-L1023. Impact: the same propagation bug was introduced three times, and future survey fixes will keep drifting from the rest of the survey-enabled estimators. Concrete fix: extract one private helper that resolves survey design, effective PSU, metadata, and survey df before dispatching by method.

Tech Debt

  • P3 The existing Wooldridge TODOs cover older issues like the QMLE small-sample adjustment, cohort-share aggregation option, canonical-link warning, and golden-value tests TODO.md:L72-L75. They do not track either P1 above, so these are unmitigated correctness findings, not deferrable tech debt. Concrete fix: fix in code before merge rather than downgrading to a TODO.

Security

  • No findings.

Documentation/Tests

  • P2 docs/choosing_estimator.rst now contradicts both the code and survey-roadmap. The prose above the table still says Wooldridge survey support is “planned for Phase 10f” docs/choosing_estimator.rst:L602-L605, while the Wooldridge row advertises Survey Bootstrap = Multiplier at PSU docs/choosing_estimator.rst:L686-L690 even though the code rejects bootstrap + survey diff_diff/wooldridge.py:L376-L381 and the roadmap documents that rejection docs/survey-roadmap.md:L211-L215, docs/survey-roadmap.md:L231-L234. Impact: users are told a shipped feature and a survey-bootstrap mode exist in cases where the library still raises. Concrete fix: update the intro paragraph and change the Wooldridge Survey Bootstrap cell to unsupported wording.
  • P3 The new tests do not cover the two load-bearing interactions introduced by this PR; the added cases at tests/test_wooldridge.py:L1332-L1443 are smoke/rejection/metadata tests only. Impact: both P1 issues can merge without a failing test. Concrete fix: add one regression test for SurveyDesign(weights=...) plus cluster=..., and one regression test that asserts the intended survey aggregation weights across simple, group, calendar, and event.

Path to Approval

  1. Propagate cluster through all three survey Wooldridge branches using _resolve_effective_cluster() and _inject_cluster_as_psu(), then recompute survey_metadata / df_inf; add regression tests for weights-only + cluster= and conflicting cluster vs explicit PSU.
  2. Pick one survey aggregation contract and implement it consistently across OLS/logit/Poisson. If count-based jwdid_estat weights are intended, revert nonlinear gt_weights to counts; if survey-weighted cell masses are intended, update OLS to match and document the deviation in REGISTRY.md.
  3. Sync docs/choosing_estimator.rst with the actual shipped behavior once the code path is settled.

- P1: Propagate cluster into survey variance via _resolve_effective_cluster
  and _inject_cluster_as_psu. Extract shared _resolve_survey_for_wooldridge()
  helper (also fixes P3 copy-paste across 3 sub-fitters).
- P1: Revert nonlinear gt_weights to cell counts (matching documented
  jwdid_estat contract). Survey weights affect per-cell ATTs and SEs but
  not aggregation weights.
- P2: Fix choosing_estimator.rst contradictions — update stale prose and
  change WooldridgeDiD bootstrap column to unsupported.
- P3: Add regression tests for weights-only+cluster injection and
  survey gt_weights being cell counts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 7, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: d9cdbf0527e703c7c527afaf5691d4987182b094


Overall Assessment

⚠️ Needs changes — the prior re-review blockers on explicit cluster= propagation, count-based _gt_weights, and the Wooldridge survey docs look addressed, but this PR still leaves unmitigated P1 issues in the new survey path.

Executive Summary

  • The previous P1s appear resolved: explicit cluster= now flows through shared survey PSU resolution, nonlinear _gt_weights are back to raw cell counts, and the Wooldridge survey docs no longer advertise unsupported survey bootstrap; see diff_diff/wooldridge.py:L71, diff_diff/wooldridge.py:L957, diff_diff/wooldridge.py:L1192, docs/choosing_estimator.rst:L685.
  • P1: weights-only Wooldridge fits with no explicit cluster= now silently inject the default unit cluster as PSU, changing df_survey and TSL SEs away from the documented implicit-PSU rule.
  • P1: the new solve_poisson(weights=...) path did not inherit solve_logit(weights=...)’s positive-weight effective-sample safeguards, so zero-weight survey cells can leave Poisson survey fits singular or fail during ASF averaging.
  • P1: the modified compatibility table says Wooldridge survey support is pweight-only, but the implementation still accepts fweight/aweight with no guard or methodology note.
  • I could not execute the added tests in this environment because pytest and numpy are unavailable here.

Methodology

The core survey construction itself is plausible: Wooldridge (2023) frames nonlinear DiD through ASF/QMLE, and Binder/Lumley-style Taylor linearization is the standard design-based route for asymptotically normal estimators under complex survey designs. The blockers below are therefore implementation-contract issues in this PR, not the X_tilde idea itself. citeturn1search1turn1search6turn3search3

  • Severity: P1. Impact: weights-only survey_design with no explicit cluster= no longer follows the shared survey contract. All three Wooldridge branches build cluster_ids as unit when self.cluster is unset, then _resolve_survey_for_wooldridge() injects those IDs as PSUs; see diff_diff/wooldridge.py:L623, diff_diff/wooldridge.py:L822, diff_diff/wooldridge.py:L1054, diff_diff/wooldridge.py:L93. But the survey registry and metadata code still define weights-only/no-PSU designs as implicit per-observation PSUs unless the user explicitly supplies cluster=; see docs/methodology/REGISTRY.md:L2297, docs/methodology/REGISTRY.md:L2300, diff_diff/survey.py:L752. That silently changes n_psu, df_survey, SEs, t-stats, p-values, and CIs for every weights-only Wooldridge fit. Concrete fix: only pass cluster_ids into _resolve_survey_for_wooldridge() when the user actually set cluster=...; otherwise leave it None so weights-only designs keep implicit PSUs, and add a regression test asserting SurveyDesign(weights=...) without cluster yields n_psu == n_obs / df_survey == n_obs - 1.
  • Severity: P1. Impact: the changed compatibility table now documents Wooldridge survey support as Full (pweight only), but the implementation never enforces that restriction. _resolve_survey_for_wooldridge() just forwards resolved.weight_type from _resolve_survey_for_fit() with no guard; see docs/choosing_estimator.rst:L685, docs/choosing_estimator.rst:L699, diff_diff/wooldridge.py:L71. That leaves SurveyDesign(weight_type="fweight") and "aweight" silently accepted even though the public docs say they are unsupported, unlike the existing pweight-only survey estimators which reject them explicitly; see diff_diff/imputation.py:L280, diff_diff/two_stage.py:L259. Concrete fix: add a resolved.weight_type != "pweight" guard in the Wooldridge survey path and test it, or fully document and validate the alternative weight semantics in REGISTRY.md; the current silent acceptance is an undocumented methodology deviation.

Code Quality

  • Severity: P1. Impact: solve_poisson(weights=...) was added without the positive-weight effective-sample safeguards already present in solve_logit(weights=...). SurveyDesign allows partially zero weights; see diff_diff/survey.py:L173. solve_logit explicitly re-checks rank and identification on the positive-weight subset; see diff_diff/linalg.py:L1221. solve_poisson does not; it rank-checks the full unweighted design and then just multiplies the score/Hessian by weights; see diff_diff/linalg.py:L2412, diff_diff/linalg.py:L2445. In the new Poisson survey ATT code, those same zero-weight cells are then averaged with np.average(..., weights=survey_weights[cell_mask]); see diff_diff/wooldridge.py:L1124, diff_diff/wooldridge.py:L1165. A treated cell that exists only on zero-weight rows can therefore remain in the Poisson path long enough to become singular or fail inside ASF averaging instead of being cleanly treated as non-estimable. Concrete fix: mirror solve_logit’s positive-weight validation/rank-reduction logic in solve_poisson, and skip or mark non-estimable Poisson cells whose survey weights sum to zero before calling weighted ASF averages.

Performance

No findings.

Maintainability

No separate findings beyond the correctness issues above.

Tech Debt

No mitigating TODO.md entry covers these issues. The existing Wooldridge TODOs are about other items only, so the P1s above are not downgradable as tracked debt; see TODO.md:L72.

Security

No findings.

Documentation/Tests

  • Severity: P2. Impact: the new tests cover smoke/rejection/metadata and the restored count-weight contract, but they still do not pin the two load-bearing edge cases above: weights-only/no-cluster should keep implicit per-observation PSUs, and Poisson survey fits should handle zero-weight effective cells cleanly. See tests/test_wooldridge.py:L1400, tests/test_wooldridge.py:L1446, tests/test_wooldridge.py:L1466. Concrete fix: add one regression asserting SurveyDesign(weights=...) without cluster leaves survey_metadata.n_psu == n_obs / df_survey == n_obs - 1, and one Poisson regression with a zero-weight treated cell that either drops the cell or raises a targeted error before ASF averaging.

Path to Approval

  1. Stop injecting the default unit cluster as a survey PSU when cluster was not explicitly supplied; preserve implicit per-observation PSUs for weights-only designs, and add a regression test for both no-cluster and explicit cluster=....
  2. Enforce the documented pweight-only Wooldridge survey contract in code, with rejection tests for fweight and aweight.
  3. Port solve_logit’s positive-weight effective-sample checks into solve_poisson(weights=...), and add a Poisson survey regression with zero-weight treated rows/cells so the new path handles non-estimable cells deterministically.

- P1: Only inject cluster as PSU when user explicitly sets cluster=;
  weights-only surveys without cluster= now keep implicit per-obs PSUs,
  preserving documented df_survey = n_obs - 1 contract.
- P1: Add pweight-only guard in _resolve_survey_for_wooldridge() —
  fweight/aweight now raise ValueError matching other pweight-only
  estimators (ImputationDiD, TwoStageDiD).
- P1: Add zero-weight safeguards to solve_poisson(weights=...) mirroring
  solve_logit's positive-weight validation (rank check on effective
  sample, sample-size identification). Skip zero-weight ASF cells in
  Poisson survey path.
- P2: Add regression tests for implicit PSU contract, fweight rejection,
  and zero-weight Poisson cell handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 7, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 31c74fa24c43d12d5970e2502aa8353eaa9e982b


Overall Assessment

⚠️ Needs changes — the prior re-review blockers on implicit-PSU handling, pweight enforcement, and Poisson positive-weight handling look addressed, but the new Wooldridge survey OLS path still has unmitigated P1 failures on valid weighted ETWFE edge cases.

Executive Summary

  • Previous P1s from the last review appear resolved: Wooldridge now preserves implicit PSUs unless cluster= is explicit, rejects non-pweight survey designs, and adds positive-weight handling plus a zero-weight-cell regression on the Poisson path.
  • P1: the new survey OLS branch recomputes TSL vcov on the full demeaned design even when solve_ols() has already dropped non-estimable columns, so rank-deficient ETWFE cases that work without survey support can now fail under survey_design.
  • P1: the same survey OLS branch passes zero-weight rows straight into weighted within_transform(), whose denominator logic has no zero-sum unit/time guard; valid subpopulation designs with an all-zero-weight unit or period can produce NaNs before estimation.
  • P2/P3: solve_poisson(weights=...) still lacks solve_logit()-style weight validation and direct weighted-solver tests, and the Methodology Registry/roadmap still omit the now-enforced pweight-only Wooldridge survey contract.

Methodology

No unmitigated P0/P1 paper mismatch stood out in the new nonlinear survey construction itself. The Binder-style X_tilde linearization is now explicitly documented in the registry, so under the review rules I treat it as a documented implementation choice rather than a methodology defect.

  • Severity: P3. Impact: the methodology source of truth is still incomplete on supported weight semantics. The code and estimator-selection docs now enforce/document Wooldridge survey support as pweight-only, but the Wooldridge entry in the registry still just says “Survey design support,” and the roadmap’s shipped note also omits that restriction. See docs/methodology/REGISTRY.md:L1177-L1183, docs/choosing_estimator.rst:L685-L699, and docs/survey-roadmap.md:L209-L215. Concrete fix: add a registry/roadmap note that Wooldridge survey support is pweight-only because the composed survey/QMLE weighting changes fweight/aweight semantics.

Code Quality

  • Severity: P1. Impact: in the new OLS survey branch, solve_ols() can drop rank-deficient columns, but compute_survey_vcov() is still called on the full demeaned X. That means common ETWFE rank-deficient cases now fail with singular survey bread instead of skipping non-estimable cells, regressing behavior the estimator already supports without survey weights. See diff_diff/wooldridge.py:L659-L675, the existing correct kept-column survey-vcov pattern in diff_diff/estimators.py:L1316-L1327, and Wooldridge’s own existing rank-deficient test case in tests/test_wooldridge.py:L700-L726. Concrete fix: mirror the nan_mask/kept_cols reduction pattern already used elsewhere: compute survey vcov on identified columns only, then expand back with NaN rows/cols.

  • Severity: P1. Impact: the same OLS survey path feeds weighted demeaning with zero-weight observations, but within_transform() divides by per-unit/per-time weight sums with no zero-sum guard. Because SurveyDesign explicitly allows partially zero weights, a valid subpopulation that zero-weights an entire unit or period can hit 0/0 during demeaning and fail before estimation. See diff_diff/wooldridge.py:L649-L653, diff_diff/utils.py:L1849-L1854, and diff_diff/survey.py:L166-L175. Concrete fix: before weighted within-transformation, either drop zero-weight rows and zero-pad the resulting score contributions back to the full sample for TSL, or detect zero-sum unit/time groups and raise a targeted error instead of letting NaNs propagate.

  • Severity: P2. Impact: solve_poisson(weights=...) is described as mirroring the weighted logit pattern, but it still lacks the basic weight validation that solve_logit() has (shape, NaN/Inf, non-negativity, positive total weight). The current Wooldridge call site is safe because SurveyDesign.resolve() validates weights upstream, but the new weighted solver API itself can still fail obscurely or accept invalid inputs. Compare diff_diff/linalg.py:L2368-L2487 with diff_diff/linalg.py:L1195-L1207. Concrete fix: copy the solve_logit() validation block into solve_poisson() and add direct weighted-solver tests.

Performance

  • No findings.

Maintainability

  • No additional findings beyond the solver-parity issue above.

Tech Debt

  • No new untracked tech-debt findings. The remaining Wooldridge QMLE-adjustment and aggregation-weight limitations are already tracked in TODO.md:L72-L75, so they are not blockers for this PR.

Security

  • No findings.

Documentation/Tests

  • Severity: P2. Impact: the new tests cover the Poisson zero-weight-cell path, but they do not pin either of the load-bearing OLS survey failures above. The suite now has a Poisson regression for zero-weight treated cells, but no survey OLS regression for rank-deficient ETWFE designs or zero-weight unit/period handling. See tests/test_wooldridge.py:L1511-L1523 and the existing unweighted rank-deficient Wooldridge case in tests/test_wooldridge.py:L700-L726. Concrete fix: add one survey OLS regression on the all-eventually-treated control_group="not_yet_treated" design and one survey OLS regression with an entire zero-weight unit or zero-weight period.

Static review only: I could not run pytest here because the sandbox environment is missing numpy.

Path to Approval

  1. In _fit_ols(), compute survey TSL vcov on the kept/identified columns only and expand it back with NaN rows/cols, matching the existing survey-OLS pattern already used elsewhere in the library.
  2. Make Wooldridge’s survey-weighted within-transform safe for zero-weight subpopulations before calling within_transform(): either drop zero-weight rows and zero-pad score contributions back to the full sample for TSL, or detect zero-sum unit/time groups and raise a targeted error. Add regressions for both the rank-deficient all-eventually-treated case and the zero-weight unit/period case.

- P1: Compute survey TSL vcov on kept columns only when solve_ols drops
  rank-deficient columns; expand back with NaN. Prevents singular bread
  matrix on all-eventually-treated ETWFE designs.
- P1: Guard against zero-weight unit/time groups before within_transform;
  raise targeted ValueError instead of letting NaN propagate.
- P2: Add weight validation (shape, NaN, Inf, non-negative, positive sum)
  to solve_poisson(weights=...) matching solve_logit pattern.
- P2: Add regression tests for rank-deficient survey OLS and zero-weight
  unit rejection.
- P3: Add pweight-only note to REGISTRY.md and survey-roadmap.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 7, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 8bcbaa79cf92c75bb3b99d438447e856d602aa5f


Overall Assessment

⚠️ Needs changes — the earlier re-review blockers look resolved, but two unmitigated P1 edge-case failures remain in the new survey_design support.

Executive Summary

  • Prior P1s from the last review appear addressed: Wooldridge now enforces pweight-only survey designs, preserves implicit PSUs unless cluster= is explicit, rejects bootstrap + survey, and adds positive-weight handling to the Poisson path.
  • P1: the new logit survey ATT loop does not handle treated cells whose survey weights sum to zero, so valid subpopulation designs can fail inside weighted averaging instead of cleanly skipping non-estimable cells.
  • P1: the OLS zero-weight guard indexes NumPy weights by pandas row labels, so any valid non-RangeIndex DataFrame with at least one zero survey weight can mis-index or crash before estimation.
  • No unmitigated methodology mismatch stood out: the nonlinear survey construction, pweight-only restriction, and bootstrap/replicate exclusions are now documented in the registry, so I am treating them as documented implementation choices rather than defects. (link.springer.com)
  • Tests added good coverage for Poisson zero-weight cells and OLS rank deficiency, but they still miss regressions for the two remaining failure modes.

Methodology

  • Severity: P3 informational. Impact: no unmitigated paper/registry mismatch found. The registry now explicitly documents the survey OLS path, the nonlinear X_tilde linearization, the pweight-only contract, and the replicate/bootstrap exclusions in docs/methodology/REGISTRY.md:L1179 and docs/methodology/REGISTRY.md:L1182; that is consistent with Wooldridge’s ASF/QMLE framing for nonlinear DiD and Binder’s design-based variance treatment for asymptotically normal estimators. Concrete fix: none. (link.springer.com)

Code Quality

  • Severity: P1. Impact: _fit_logit() still crashes on a treated cell whose survey weights sum to zero. The new weighted helpers call np.average(..., weights=survey_weights[cell_mask]) at diff_diff/wooldridge.py:L930 and diff_diff/wooldridge.py:L935, but unlike Poisson there is no zero-weight-cell skip in the ATT loop around diff_diff/wooldridge.py:L945. Poisson now handles exactly this case at diff_diff/wooldridge.py:L1185. On all-zero weights, numpy.average raises ZeroDivisionError, so a valid subpopulation design can fail in production. Concrete fix: mirror the Poisson np.sum(survey_weights[cell_mask]) == 0 guard in _fit_logit() before any ATT/gradient averaging, and omit such cells from gt_effects, gt_weights, and gt_grads. (numpy.org)
  • Severity: P1. Impact: the new OLS zero-weight guard assumes positional indexing but uses pandas row labels. _filter_sample() preserves the caller’s index at diff_diff/wooldridge.py:L428, and the guard then sums survey_weights[g.index] at diff_diff/wooldridge.py:L654. For any valid non-RangeIndex input with a zero survey weight, this can index the wrong rows or throw before estimation. Concrete fix: align weights to sample.index with a pandas object before grouping, or reset sample to a fresh RangeIndex before any NumPy positional indexing.

Performance

No findings.

Maintainability

No findings beyond the two code-quality blockers above.

Tech Debt

No new blocking tech-debt findings. The pre-existing Wooldridge aggregation-weight and QMLE-adjustment limitations are already tracked in TODO.md:L72 and TODO.md:L73, so they do not affect this assessment.

Security

No findings.

Documentation/Tests

  • Severity: P2. Impact: the new suite covers Poisson zero-weight cells, OLS rank deficiency, and OLS zero-weight-unit rejection, but it does not pin either remaining blocker: there is no logit zero-weight-cell regression and no OLS regression using a non-RangeIndex DataFrame. See tests/test_wooldridge.py:L1511, tests/test_wooldridge.py:L1525, and tests/test_wooldridge.py:L1538. Concrete fix: add one logit survey test that zeroes an entire treated cohort’s weights, and one OLS survey test that sets a custom DataFrame index before exercising the zero-weight-unit/time guard.

Path to Approval

  1. In _fit_logit(), add the same zero-weight-cell handling already used in Poisson before calling the weighted _avg() / _avg_ax0() helpers, and skip non-estimable cells consistently in the stored ATT objects and aggregate inputs.
  2. In _fit_ols(), make the zero-weight unit/time guard index-safe by aligning survey_weights to sample.index or resetting the sample index before the guard, then add a regression test with a custom DataFrame index plus zero weights.

Static review only: I could not run pytest in this sandbox because the Python environment is missing both numpy and pandas.

- P1: Add zero-weight cell skip to logit ASF loop, mirroring Poisson
- P1: Reset sample index in _fit_ols for index-safe zero-weight guard
- P2: Add regression tests for logit zero-weight cell and non-RangeIndex
  OLS survey guard

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 7, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 7df411535fccad97bb3fe971f21a2eb8569d722c


Overall Assessment

⚠️ Needs changes

Executive Summary

  • Prior P1s from the last review appear resolved: OLS now resets to a positional index before the zero-weight guard in diff_diff/wooldridge.py:L628-L659, logit now skips all-zero-weight treated cells in diff_diff/wooldridge.py:L944-L997, and Poisson now accepts survey weights with positive-weight identification checks in diff_diff/linalg.py:L2393-L2504.
  • I did not find an unmitigated methodology mismatch: the new survey OLS and nonlinear QMLE paths match the documented registry note in docs/methodology/REGISTRY.md:L1179-L1184, and the cited Wooldridge/Binder sources are consistent with pooled nonlinear DiD plus design-based variance for regression-type estimators. (academic.oup.com)
  • Severity P1 [Newly identified]: the new OLS zero-weight guard uses groupby.apply(..., include_groups=False) in diff_diff/wooldridge.py:L654-L659, but the project still declares pandas>=1.3.0 in pyproject.toml:L46-L49. include_groups was added in pandas 2.2.0, so this branch will fail on supported older pandas versions. (pandas.pydata.org)
  • The new tests cover the prior blocker regressions, including Poisson/logit zero-weight treated cells and the non-RangeIndex OLS case, in tests/test_wooldridge.py:L1511-L1575.
  • Static review only: pytest is not installed and pandas is unavailable in this sandbox, so I could not execute the suite. I did verify the touched Python files parse successfully via ast.

Methodology

  • Severity: P3 informational. Impact: No unmitigated methodology defect found. The survey OLS path, the nonlinear X_tilde linearization path, and the pweight/replicate/bootstrap restrictions are documented in docs/methodology/REGISTRY.md:L1179-L1184, so I am treating those as documented implementation choices rather than defects. Concrete fix: none. (academic.oup.com)

Code Quality

  • Severity: P1 [Newly identified]. Impact: The OLS zero-weight guard in diff_diff/wooldridge.py:L654-L659 calls DataFrameGroupBy.apply(..., include_groups=False), but the package still supports pyproject.toml:L46-L49 pandas>=1.3.0. pandas documents include_groups as new in 2.2.0, and the 1.3 API does not have that parameter, so any supported older-pandas environment that hits this survey OLS edge case will raise TypeError before the intended ValueError. Concrete fix: replace that call with a version-compatible grouped sum such as pd.Series(survey_weights, index=sample.index).groupby(sample[grp_col]).sum() or sample.assign(_w=survey_weights).groupby(grp_col)['_w'].sum(). (pandas.pydata.org)

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3 informational. Impact: No new untracked Wooldridge tech debt stood out. The pre-existing Wooldridge items already tracked in TODO.md:L72-L74 remain mitigated and do not count against this PR. Concrete fix: none required for approval.

Security

  • No findings.

Documentation/Tests

  • Severity: P2. Impact: The new survey tests in tests/test_wooldridge.py:L1511-L1575 cover the last re-review blockers, but they do not exercise the newly changed survey-specific aggregate() inference path in diff_diff/wooldridge_results.py:L98-L100 or the summary survey block in diff_diff/wooldridge_results.py:L187-L190. A regression there would change survey p-values/CI presentation without these new tests failing. Concrete fix: add one survey-mode results test that calls aggregate("group") or aggregate("event") and asserts _df_survey-based inference plus the survey block in summary().

Path to Approval

  1. Replace the pandas-2.2-only groupby.apply(..., include_groups=False) call in WooldridgeDiD._fit_ols() with a pandas-1.3-compatible grouped-sum implementation.
  2. Add a regression test for that zero-weight survey OLS guard using the rewritten aggregation path, so minimum-supported-pandas compatibility cannot silently regress again.

- P1: Replace pandas-2.2-only groupby.apply(include_groups=False) with
  pandas-1.3-compatible pd.Series.groupby().sum() in OLS zero-weight guard
- P2: Add test for survey aggregate() with df_survey inference and
  summary() survey block display

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 7, 2026

/ai-review

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: faed8a3e7c8f14d6c7b3dc84bb4d6333647c650c


Overall Assessment

✅ Looks good

Executive Summary

Methodology

Code Quality

Performance

  • No findings.

Maintainability

Tech Debt

  • Severity: P3 informational (tracked in TODO.md). Impact: The new survey regression test in tests/test_wooldridge.py:L1466-L1478 explicitly preserves the existing count-based aggregation behavior. Wooldridge (2025) presents cohort-share-weighted aggregation as the natural weighted summary in Eqs. 7.2-7.4, but this limitation is already tracked for WooldridgeDiD in TODO.md:L72-L75, so it is informational only and not a blocker for this PR. Concrete fix: keep the existing TODO item; future work can add cohort-share or survey-weighted aggregation options. (link.springer.com)

Security

  • No findings.

Documentation/Tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant