docs: add retrospective paper reviews for TROP and Wooldridge ETWFE by igerber · Pull Request #443 · igerber/diff-diff

igerber · 2026-05-15T13:41:57Z

Summary

Adds two paper-review markdown files under `docs/methodology/papers/`, following the existing template. Both reviews are retrospective documentation for estimators already shipped in the library.

`athey-2025-review.md` (358 lines) — Athey, Imbens, Qu, Viviano (2025) "Triply Robust Panel Estimators" (arXiv:2508.21536). Backs `diff_diff/trop.py`.
`wooldridge-2023-review.md` (248 lines) — Wooldridge (2023) "Simple approaches to nonlinear difference-in-differences with panel data" (Econometrics Journal 26(3), doi:10.1093/ectj/utad016). Backs `diff_diff/wooldridge.py`.

Methodology references (required if estimator / math changes)

Method name(s): TROP, WooldridgeDiD (ETWFE)
Paper / source link(s): https://arxiv.org/abs/2508.21536 ; https://doi.org/10.1093/ectj/utad016
Any intentional deviations from the source (and why): none documented in this PR (it is documentation-only and adds reviews for already-shipped estimators; deviations would be tracked in REGISTRY.md alongside the implementation)

Validation

Tests added/updated: none (docs-only)
Backtest / simulation / notebook evidence (if applicable): n/a

Security / privacy

Confirm no secrets/PII in this PR: confirmed

Both reviews follow the existing template under docs/methodology/papers/ and back already-shipped estimators (diff_diff/trop.py, diff_diff/wooldridge.py). - athey-2025-review.md — Athey, Imbens, Qu, Viviano (2025) "Triply Robust Panel Estimators" (arXiv:2508.21536) - wooldridge-2023-review.md — Wooldridge (2023) "Simple approaches to nonlinear difference-in-differences with panel data" (Econometrics Journal 26(3), doi:10.1093/ectj/utad016) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T13:48:52Z

Overall Assessment

✅ Looks good

This is a docs-only PR, so there are no unmitigated P0/P1 findings. The main issues are P2/P3 documentation accuracy problems where the new paper-review files drift from the shipped registry/code.

Executive Summary

No estimator, weighting, variance, or inference code changed in this PR, so there is no blocker-level risk in changed executable paths.
The new review files contain several implementation-facing notes that no longer match the current library contract.
wooldridge-2023-review.md has one substantive source-interpretation error around the meaning of δ₂. (academic.oup.com)
athey-2025-review.md overstates current TROP support for treatment-pattern flexibility and uses outdated public method names/descriptions.
One Wooldridge aggregation difference is already documented in REGISTRY.md / TODO.md; that should be surfaced here as a deviation note, not treated as a defect.
athey-2025-review.md also commits a contributor-local absolute filesystem path.

Methodology

Severity: P2. Impact: docs/methodology/papers/wooldridge-2023-review.md:L49-L55 says that when G(z)=exp(z), δ₂ is the “log-odds ratio (logit) or log rate ratio (Poisson).” That conflates two different link-function interpretations. Wooldridge distinguishes the exponential mean case as a log difference / proportional effect, while the logistic mean gives a change in log-odds. Concrete fix: split this sentence by link function and mirror the paper’s wording. (academic.oup.com)
Severity: P2. Impact: docs/methodology/papers/athey-2025-review.md:L274-L280 is under “Implementation Notes” but says treatment switching on/off is supported. Current TROP requires an absorbing treatment indicator and rejects non-absorbing/event-style inputs; that contract is enforced in diff_diff/trop.py:L500-L527 and diff_diff/trop_global.py:L616-L630, and documented in docs/methodology/REGISTRY.md:L2002-L2015. Concrete fix: rewrite this as a paper-scope remark or explicitly say the shipped implementation requires absorbing treatment.

Code Quality

Severity: P2. Impact: docs/methodology/papers/wooldridge-2023-review.md:L188-L205 has stale implementation notes: it lists control_group default as "never_treated" and says Poisson still needs a new solver. Current code defaults to "not_yet_treated" and already uses solve_poisson in the Poisson path (diff_diff/wooldridge.py:L302-L306, diff_diff/wooldridge.py:L1085-L1124), consistent with docs/methodology/REGISTRY.md:L1376-L1388. Concrete fix: update the tuning-parameter table and implementation notes to the shipped API, or label them as historical/pre-implementation notes.

Performance

No findings in changed scope.

Maintainability

Severity: P2. Impact: docs/methodology/papers/athey-2025-review.md:L312-L318 refers to "twostep" and "joint" methods and describes the latter as homogeneous-effect WLS. The public API is method="local" / method="global", and the global path computes residual-based treated-cell effects averaged into ATT (diff_diff/trop.py:L64-L78, diff_diff/trop_global.py:L554-L585, docs/methodology/REGISTRY.md:L2139-L2146). Concrete fix: rename these to local / global and align the description with the current estimator contract.

Tech Debt

Severity: P3. Impact: docs/methodology/papers/wooldridge-2023-review.md:L103-L108 summarizes the paper’s aggregation using cohort-share weights, which is correct for the paper, but current library behavior differs and that deviation is already documented in docs/methodology/REGISTRY.md:L1358-L1364 and tracked in TODO.md:L79-L81. Concrete fix: add a one-line “current implementation deviation” note pointing readers to the existing registry/TODO entry. (academic.oup.com)

Security

Severity: P3. Impact: docs/methodology/papers/athey-2025-review.md:L5 embeds an absolute local path (/Users/igerber/...). That is non-portable and leaks contributor-local environment details. Concrete fix: replace it with a repo-relative path such as papers/2508.21536v2.pdf or the arXiv URL.

Documentation/Tests

No separate testing finding. For a docs-only PR, the absence of new tests is acceptable; the issues above are documentation-accuracy fixes, not test gaps.

athey-2025-review.md: - Replace contributor-local absolute PDF path with arXiv URL - Note shipped TROP requires absorbing treatment (paper Eq 13 generalization is out of scope for the current implementation) - Rename "twostep"/"joint" to "local"/"global" and correct the global-method description to residual-based treated-cell effects averaged into ATT wooldridge-2023-review.md: - Split delta_2 interpretation by link function (exponential = log diff / proportional effect; logistic = change in log-odds) - Update control_group default to "not_yet_treated" (matches wooldridge.py:305) - Update implementation note: solve_poisson exists at linalg.py:3431 and is used in the Poisson path - Add aggregation deviation note linking to REGISTRY and TODO entries Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-05-15T14:03:22Z

/ai-review

github-actions · 2026-05-15T14:10:12Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 948ee9a169c37712f5af594180e5c173c45eb474

Overall Assessment

✅ Looks good

Executive Summary

This is still a docs-only PR, so there are no changed estimator, weighting, variance, or inference code paths that could introduce a P0/P1 executable defect.
Re-review result: the earlier substantive drift items are mostly addressed. The new reviews now fix the Wooldridge δ₂ wording, surface the Wooldridge aggregation-weight deviation as documented/tracked, state TROP’s shipped absorbing-state restriction, use current local/global TROP naming, and remove the prior contributor-local absolute path.
One unmitigated P2 remains in the TROP review: the simulation-design summary says “7 real datasets,” but both the file’s own list and the paper’s Section 3/Table 1 description point to six source datasets / an applications-vs-datasets mix-up. citeturn7view0
One unmitigated P2 remains in the Wooldridge review: its “Implementation Notes” understate current covariate support relative to the shipped API.
One minor P3 remains: the TROP review still contains an internal “Agent 2” authoring artifact.

Methodology

Severity: P2. docs/methodology/papers/athey-2025-review.md:L323-L331. Impact: the review says the simulations are based on “7 real datasets,” but the same block only enumerates six items, and the paper’s Section 3 / Table 1 description likewise names six source datasets (CPS, PWT, Germany, Basque, Smoking, Boatlift). In a methodology review, that is a direct source-material mismatch. Concrete fix: change the sentence to “6 real datasets,” or explicitly say “7 simulation applications” and split CPS logwage vs CPS urate out as separate applications. citeturn7view0

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity: P3. docs/methodology/papers/athey-2025-review.md:L351-L354. Impact: the committed review still refers to “Agent 2’s extraction...” when discussing Equation 13. That is an internal authoring artifact, not an auditable source-based note, and it weakens the document’s credibility. Concrete fix: rewrite this as a neutral source-based note (“Equation 13’s penalty term should be rechecked against the PDF”) or remove it once verified.

Tech Debt

No findings. The Wooldridge aggregation-weight difference is now correctly surfaced as a documented implementation deviation and is already tracked in docs/methodology/REGISTRY.md:L1357-L1364 and TODO.md:L79-L80.

Security

No findings. The prior contributor-local path leak is fixed; docs/methodology/papers/athey-2025-review.md:L5 now uses a public arXiv reference instead of a local absolute filesystem path.

Documentation/Tests

Severity: P2. docs/methodology/papers/wooldridge-2023-review.md:L176-L198. Impact: the “Implementation Notes” present covariates as generic, time-constant-only inputs, but the shipped estimator exposes exovar, xtvar, and xgvar, and the registry explicitly documents time-varying covariates plus cohort/time interaction handling at diff_diff/wooldridge.py:L387-L411 and docs/methodology/REGISTRY.md:L1369-L1374. That understates current implementation capability and can mislead future contributors using this review as a library-contract reference. Concrete fix: distinguish paper notation from the shipped API, or add one sentence that time-varying covariates are supported via xtvar and cohort-interacted covariates via xgvar.
No separate test-gap finding. For a docs-only PR, the absence of new tests is acceptable.

athey-2025-review.md: - Correct "7 real datasets" to "6 real datasets / 7 simulation applications" (CPS is used for both logwage and urate outcomes; paper Table 1 / Section 3 names 6 source datasets) - Rewrite Equation 13 nuclear-norm gap note as a neutral source-based check (remove authoring artifact) wooldridge-2023-review.md: - Surface shipped covariate API (exovar / xtvar / xgvar incl. time-varying via xtvar with demean_covariates default) in Data Structure Requirements and Tuning Parameters table; cross-link to wooldridge.py:394-411 and REGISTRY.md "Covariates" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T14:29:59Z

🔁 AI review rerun (requested by @igerber)

Head SHA: b527f78eac40ff95f0a52710b1c012b357e29621

Overall Assessment

✅ Looks good

Executive Summary

Re-review result: the prior open issues called out in the last AI pass appear resolved. The TROP review now fixes the dataset/application wording at docs/methodology/papers/athey-2025-review.md:L325-L331, and the Wooldridge review now documents shipped covariate support at docs/methodology/papers/wooldridge-2023-review.md:L176-L179, L198-L198.
P2: the new Wooldridge review now contradicts itself on aggregation-weight sourcing. It cites “Eqs. 7.2-7.4” as if they were in the reviewed paper at docs/methodology/papers/wooldridge-2023-review.md:L105-L111, then later says the 2023 paper gives no explicit aggregation equations at docs/methodology/papers/wooldridge-2023-review.md:L236-L240.
P2: the Wooldridge implementation notes still blur paper notation and shipped API. They imply a required W_it input and a broader bootstrap surface than WooldridgeDiD actually exposes in code.
P3 informational: the Wooldridge aggregation-weight deviation itself is now properly documented and tracked in docs/methodology/REGISTRY.md:L1357-L1364 and TODO.md:L79-L80, so that deviation is not a blocker.

Methodology

P2 docs/methodology/papers/wooldridge-2023-review.md:L105-L111, docs/methodology/papers/wooldridge-2023-review.md:L236-L240, docs/methodology/REGISTRY.md:L1363-L1364. Impact: the file currently says the reviewed 2023 paper’s aggregation weights come from “Eqs. 7.2-7.4,” but later says the 2023 paper only describes the weights conceptually and gives no explicit equations. That weakens source traceability for the deviation note and makes it unclear whether the doc is citing Wooldridge (2023) or the later Wooldridge ETWFE source already used in the registry. Concrete fix: rewrite L111 to attribute the formal cohort-share equations to the later Wooldridge ETWFE source already referenced in REGISTRY.md, or else remove the equation-number claim and say the 2023 paper gives only the conceptual cohort-share weighting.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

P3 docs/methodology/papers/wooldridge-2023-review.md:L111, docs/methodology/REGISTRY.md:L1357-L1364, TODO.md:L79-L80. Impact: none. The cell-weight vs cohort-share aggregation difference is now correctly surfaced as a documented deviation and tracked technical debt, so it should not hold up approval. Concrete fix: none required in this PR.

Security

No findings.

Documentation/Tests

P2 docs/methodology/papers/wooldridge-2023-review.md:L99-L110, docs/methodology/papers/wooldridge-2023-review.md:L176-L198, diff_diff/wooldridge.py:L165-L189, diff_diff/wooldridge.py:L394-L411, diff_diff/wooldridge.py:L432-L445, docs/methodology/REGISTRY.md:L1410-L1418. Impact: the review mixes paper notation with the shipped library contract. In the paper, W_it is part of the model notation and panel bootstrap is a valid inference route; in the implementation, users pass cohort and the design matrix derives treatment cells internally, and n_bootstrap > 0 is only supported for method="ols" and rejected with survey_design. As written, the new review overstates the current API and inference surface for WooldridgeDiD. Concrete fix: in the Implementation Notes, explicitly separate “paper notation” from “shipped API”: say users provide cohort/first_treat, W_it is constructed internally from cohort and time, and bootstrap is currently OLS-only and unavailable with survey_design.
No separate test-gap finding. For a docs-only PR, the absence of new tests is acceptable.

wooldridge-2023-review.md: - Aggregation note: stop attributing "Eqs. 7.2-7.4" to the 2023 paper (the 2023 paper describes aggregation only conceptually in Section 3.1; the formal cohort-share equations are from W2025 per REGISTRY.md) - Implementation Notes: separate paper notation from shipped API. Users provide cohort/first_treat; W_it is constructed internally from cohort+ time via _build_interaction_matrix (wooldridge.py:165-189), not passed as a column - Standard errors: add shipped-API restriction note — n_bootstrap > 0 is OLS-only (wooldridge.py:432-437) and rejected with survey_design (wooldridge.py:441-444) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T14:38:42Z

🔁 AI review rerun (requested by @igerber)

Head SHA: c92c2dcf439916c13b3437dc8fff70873ebeec03

Overall Assessment

✅ Looks good

Executive Summary

Re-review result: the prior Wooldridge documentation issues from the last AI pass appear resolved. The aggregation-weight deviation is now explicitly tied to REGISTRY.md/TODO.md, and the review now separates paper notation from the shipped cohort/bootstrap API surface.
P2 docs/methodology/papers/wooldridge-2023-review.md:L121-L129: the Section 7 extension bullets still need stronger caveats around exit and nonbinary-treatment generalizations. (academic.oup.com)
P2 docs/methodology/papers/athey-2025-review.md:L275-L281: the TROP review still blurs the paper’s balanced-panel setup with the shipped library’s unbalanced-panel support.
P3 informational: the Wooldridge aggregation-weight deviation is now properly documented and tracked, so it is not blocking.

Methodology

P2 docs/methodology/papers/wooldridge-2023-review.md:L121-L129. Impact: the new “Edge cases” bullets compress Section 7 too aggressively. In the published article, Section 7 is explicitly an extensions section; the exit case adds an extra restriction that future shocks to untreated potential outcomes cannot drive exit, and the multiple-treatment-level path is described as only relatively straightforward, not completely general, with the precise estimand left for future work. Concrete fix: rewrite the Treatment exit and Multiple treatment levels bullets as qualified extension notes and carry over those caveats. (academic.oup.com)

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

P3 docs/methodology/papers/wooldridge-2023-review.md:L106-L112, docs/methodology/REGISTRY.md:L1357-L1364, TODO.md:L79-L80. Impact: none. The cell-weight vs cohort-share aggregation difference is now correctly documented as a deviation and tracked technical debt. Concrete fix: none in this PR.

Security

No findings.

Documentation/Tests

P2 docs/methodology/papers/athey-2025-review.md:L275-L281, docs/methodology/papers/athey-2025-review.md:L359-L359, docs/methodology/REGISTRY.md:L2113-L2137, diff_diff/trop.py:L500-L557. Impact: the TROP review’s Implementation Notes open with a balanced-panel requirement even though the same review later says the shipped implementation supports unbalanced panels, and the registry/code already document that extension. That leaves the paper-vs-library contract ambiguous for readers using this review as implementation guidance. Concrete fix: relabel the opening bullet as a paper assumption (Paper assumption: balanced panel) or add an adjacent shipped-implementation note that diff_diff/trop.py accepts unbalanced panels with structural gaps.
No separate test-gap finding. For a docs-only PR, no test changes are required.

athey-2025-review.md: - Relabel balanced-panel bullet as "Paper assumption" and add an adjacent shipped-implementation note that diff_diff/trop.py supports unbalanced panels with structural gaps (matches the later "Unbalanced panels" entry under Gaps and Uncertainties and the corresponding REGISTRY section) wooldridge-2023-review.md: - Treatment exit (Section 7.2) bullet: mark as extension and carry over the paper's additional restriction that future shocks to untreated potential outcomes cannot drive exit - Multiple treatment levels (Section 7.4) bullet: mark as extension; note the paper describes it as relatively straightforward but not fully general, leaving the precise multi-level estimand to future work Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T14:49:26Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 6dbddf7d55fe61fdeced432145b9d5ba66c8d90f

Overall Assessment
✅ Looks good

Executive Summary

Methods reviewed: TROP and WooldridgeDiD (ETWFE).
Re-review result: the prior Wooldridge Section 7 caveat issue and TROP balanced-panel wording issue appear resolved in the new paper reviews. docs/methodology/papers/wooldridge-2023-review.md:L121-L129, docs/methodology/papers/athey-2025-review.md:L274-L282. (academic.oup.com)
P2 remaining: the new TROP review is still internally inconsistent on weight normalization and still carries an Eq. 13 “should be rechecked” placeholder while presenting itself as registry-copy-ready. docs/methodology/papers/athey-2025-review.md:L10-L13, docs/methodology/papers/athey-2025-review.md:L255-L268, docs/methodology/papers/athey-2025-review.md:L352-L355, docs/methodology/REGISTRY.md:L2052-L2059, docs/methodology/REGISTRY.md:L2126-L2129. (ar5iv.org)
The Wooldridge aggregation-weight deviation remains properly documented and tracked, so it is informational only. docs/methodology/papers/wooldridge-2023-review.md:L106-L112, docs/methodology/REGISTRY.md:L1357-L1364, TODO.md:L79-L80.
No P0/P1 issues found; this docs-only PR does not introduce code, inference, or security regressions.

Methodology

P2 docs/methodology/papers/athey-2025-review.md:L10-L13, docs/methodology/papers/athey-2025-review.md:L255-L268, docs/methodology/papers/athey-2025-review.md:L352-L355, docs/methodology/REGISTRY.md:L2052-L2059, docs/methodology/REGISTRY.md:L2126-L2129. Impact: the TROP review is framed as ready to copy into the methodology registry, but it still treats 1^T ω = 1^T θ = 1 as a checklist requirement while later saying normalization is unclear, and it leaves Eq. 13’s penalty as “should be rechecked.” The source paper presents exponential weight construction in Section 2 and a separate sum-to-one condition in the theory section, so this should be documented as an unresolved source ambiguity rather than a settled implementation requirement. Concrete fix: move weight normalization out of the checklist into an explicit ambiguity/note, verify Eq. 13 once against the source, and only then keep the registry-copy-ready framing. (ar5iv.org)
No other methodology findings. The Wooldridge Section 7 caveats now match the paper’s exit/endogeneity warning and its qualified framing for multi-level treatment extensions, and the TROP notes now cleanly separate the paper setup from the shipped unbalanced-panel extension. (academic.oup.com)

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings beyond the TROP documentation inconsistency above.

Tech Debt

P3 docs/methodology/papers/wooldridge-2023-review.md:L106-L112, docs/methodology/REGISTRY.md:L1357-L1364, TODO.md:L79-L80. Impact: none; the Wooldridge aggregation-weight deviation is explicitly documented and tracked. Concrete fix: none in this PR.

Security

No findings.

Documentation/Tests

No additional findings. This is a docs-only PR, so no test changes are required.

athey-2025-review.md: - Reframe Methodology Registry Entry intro from "copy into REGISTRY" ready-to-promote to a working-draft framing that explicitly defers promotion until two source-ambiguity items (weight normalization, Eq. 13 penalty form) are resolved against the source - Pull the weight-normalization line out of the Requirements Checklist (it was framed as a settled requirement); restate it as an open source-ambiguity cross-referencing Gap #5, with the current shipped implementation pinned to the Equation 2 (unnormalized) interpretation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T15:00:09Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 796decd1e7dc1dd039b55625410777c1932de447

Overall Assessment

✅ Looks good

Executive Summary

Re-review outcome: no unmitigated P0/P1 findings. This PR is docs-only and does not change estimator code, inference paths, or defaults.
The earlier Wooldridge Section 7 caveat issue appears resolved: the new review now matches the paper’s treatment-exit endogeneity warning and its qualified framing for multi-level treatments. docs/methodology/papers/wooldridge-2023-review.md:L121-L129 (academic.oup.com)
The prior TROP balanced-panel wording issue appears resolved: the draft now clearly distinguishes the paper’s balanced-panel setup from the library’s unbalanced-panel extension. docs/methodology/papers/athey-2025-review.md:L274-L282
P3: the TROP review still treats Equation 13’s penalty form as an unresolved source ambiguity even though the indexed paper text already shows the same unsquared nuclear-norm penalty used elsewhere in the draft. docs/methodology/papers/athey-2025-review.md:L10-L12, docs/methodology/papers/athey-2025-review.md:L352-L355 (researchgate.net)
P3: the Wooldridge review’s PDF reviewed field points to papers/wooldridge.pdf, which is not present in the repo. docs/methodology/papers/wooldridge-2023-review.md:L5-L5
The Wooldridge aggregation-weight difference remains properly documented in REGISTRY.md and tracked in TODO.md, so it is informational only. docs/methodology/papers/wooldridge-2023-review.md:L106-L112, docs/methodology/REGISTRY.md:L1357-L1364, TODO.md:L79-L80 (academic.oup.com)

Methodology

No P0/P1 methodology defects found in this docs-only PR.
P3 docs/methodology/papers/athey-2025-review.md:L10-L12, docs/methodology/papers/athey-2025-review.md:L352-L355. Impact: the TROP review still blocks later registry promotion on an issue the source already appears to settle, so maintainers may carry unnecessary uncertainty forward. Concrete fix: remove Equation 13 from the “open source-ambiguity” list and state directly that the multiple-treated objective uses the same unsquared nuclear-norm penalty as Equation 2; keep the weight-normalization tension as the only unresolved source question. (researchgate.net)

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

P3 docs/methodology/papers/wooldridge-2023-review.md:L106-L112, docs/methodology/REGISTRY.md:L1357-L1364, TODO.md:L79-L80. Impact: none for approval; the current aggregation-weight deviation is already explicitly documented and tracked. Concrete fix: none required in this PR. (academic.oup.com)

Security

No findings.

Documentation/Tests

P3 docs/methodology/papers/wooldridge-2023-review.md:L5-L5. Impact: the provenance trail for the review is not reproducible from the repo because the cited reviewed PDF does not exist at that path. Concrete fix: replace the field with the official article/PDF location or vendor the reviewed PDF into the referenced path.
No test changes are needed for this docs-only PR.

athey-2025-review.md: - Mark Equation 13 nuclear-norm penalty as resolved (paper text confirms the same unsquared form as Equation 2) - Tighten draft-framing intro: weight normalization is the only remaining open source-ambiguity item wooldridge-2023-review.md: - Replace nonexistent local PDF path with the Econometrics Journal DOI URL so the provenance trail is reproducible from the repo Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T15:06:10Z

🔁 AI review rerun (requested by @igerber)

Head SHA: b14b7d01ac598c80a1f5961d387c93692c590ffb

Overall Assessment

✅ Looks good

Executive Summary

Re-review outcome: no unmitigated P0/P1 findings. The diff is docs-only and does not change estimator code, inference paths, or defaults.
The Wooldridge review now matches the paper’s Section 7 caveats on treatment exit and on the limited/qualified nature of the multi-level-treatment extension. docs/methodology/papers/wooldridge-2023-review.md:L125-L128 citeturn6view1turn6view2
The TROP review addresses the earlier Equation 13 penalty issue and now clearly separates the paper’s balanced-panel setup from the library’s unbalanced-panel extension. docs/methodology/papers/athey-2025-review.md:L275-L282, docs/methodology/papers/athey-2025-review.md:L352-L354
The prior Wooldridge provenance issue is fixed: PDF reviewed now points to the official article URL instead of a nonexistent repo PDF. docs/methodology/papers/wooldridge-2023-review.md:L4-L5
One minor P3 remains: the TROP review says it reviewed arXiv v2 but links to the generic arXiv abstract page rather than a version-pinned artifact, and the current arXiv entry now shows v3. docs/methodology/papers/athey-2025-review.md:L4-L5 citeturn7view0
The Wooldridge aggregation-weight deviation remains correctly documented in REGISTRY.md and tracked in TODO.md, so it is informational only. docs/methodology/papers/wooldridge-2023-review.md:L106-L112, docs/methodology/REGISTRY.md:L1357-L1364, TODO.md:L79-L80

Methodology

Severity: none. Impact: I did not find any undocumented mismatch to the cited methods in the added reviews, and the previously noted Wooldridge Section 7 caveat issue appears resolved. Concrete fix: none. docs/methodology/papers/wooldridge-2023-review.md:L125-L128 citeturn6view1turn6view2

Code Quality

Severity: none. Impact: docs-only change; no production code paths are modified. Concrete fix: none.

Performance

Severity: none. Impact: docs-only change; no runtime or memory impact. Concrete fix: none.

Maintainability

Severity: none. Impact: the new reviews follow the existing methodology-review structure and cross-reference implementation/registry context clearly. Concrete fix: none.

Tech Debt

Severity: P3. Impact: none for approval; the Wooldridge aggregation-weight difference is already explicitly documented and tracked, so this PR does not introduce new untracked debt. Concrete fix: none required in this PR. docs/methodology/papers/wooldridge-2023-review.md:L106-L112, docs/methodology/REGISTRY.md:L1357-L1364, TODO.md:L79-L80

Security

Severity: none. Impact: no findings; docs-only content, no executable changes, no secret exposure. Concrete fix: none.

Documentation/Tests

Severity: P3. Impact: docs/methodology/papers/athey-2025-review.md:L4-L5 does not pin the exact reviewed TROP artifact; it labels the review as v2 but points at the generic arXiv record rather than a version-specific v2 abstract/PDF, which weakens provenance because the current arXiv entry now resolves to v3. Concrete fix: replace the field with a version-pinned v2 arXiv abstract or PDF link. citeturn7view0

athey-2025-review.md: - Replace generic arXiv abstract URL with a version-pinned v2 link so the reviewed artifact resolves to v2 (the current arXiv record now resolves to v3) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T15:10:15Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 0fb0d51994fca942bbc2c1b6bd2259acbd439421

Overall Assessment

✅ Looks good

Executive Summary

Re-review result: no unmitigated P0/P1 findings. The diff is docs-only and does not change estimator code, inference paths, or defaults.
The prior TROP provenance issue is resolved: the review now pins the reviewed artifact to arXiv v2 in docs/methodology/papers/athey-2025-review.md:L4-L5.
The Wooldridge review correctly treats the aggregation-weight mismatch as a documented/tracked implementation deviation rather than an undocumented methodology error in docs/methodology/papers/wooldridge-2023-review.md:L106-L112, docs/methodology/REGISTRY.md:L1357-L1364, and TODO.md:L79-L80.
The new reviews continue to separate paper scope from shipped-library scope, including TROP’s absorbing-state restriction in docs/methodology/papers/athey-2025-review.md:L275-L282 and the matching code gate in diff_diff/trop.py:L505-L527.
No new documentation issue rose to P1+: the added reviews are consistent with the current registry/code contracts for TROP and WooldridgeDiD.

Methodology

Severity: none. Impact: No undocumented methodology mismatch introduced. The TROP review keeps the weight-normalization discrepancy explicitly open instead of silently promoting it to a requirement, and the Wooldridge review labels the shipped aggregation-weight difference as a deviation/tracked item rather than claiming paper-level parity. Concrete fix: none. See docs/methodology/papers/athey-2025-review.md:L255-L268, docs/methodology/papers/athey-2025-review.md:L352-L360, and docs/methodology/papers/wooldridge-2023-review.md:L106-L112.

Code Quality

Severity: none. Impact: The new review files follow the existing methodology-review structure and cross-reference the relevant implementation gates, including Wooldridge bootstrap restrictions in diff_diff/wooldridge.py:L432-L444 and TROP’s absorbing-state validation in diff_diff/trop.py:L505-L527. Concrete fix: none.

Performance

Severity: none. Impact: Docs-only change; no runtime, memory, or algorithmic effect. Concrete fix: none.

Maintainability

Severity: none. Impact: The reviews clearly distinguish paper assumptions from shipped-library extensions/restrictions, which should reduce ambiguity if these drafts are later promoted into REGISTRY.md. Concrete fix: none. See docs/methodology/papers/athey-2025-review.md:L272-L320 and docs/methodology/papers/wooldridge-2023-review.md:L175-L215.

Tech Debt

Severity: P3 (informational, mitigated). Impact: The Wooldridge aggregation-weight divergence remains tracked technical debt, not a blocker. This PR documents it consistently with the existing registry/TODO trail and does not introduce new untracked debt. Concrete fix: none in this PR. See docs/methodology/papers/wooldridge-2023-review.md:L106-L112, docs/methodology/REGISTRY.md:L1357-L1364, and TODO.md:L79-L80.

Security

Severity: none. Impact: No executable changes, no new data-handling surface, and no secret exposure in the added docs. Concrete fix: none.

Documentation/Tests

Severity: none. Impact: The previous documentation provenance concern is fixed by the version-pinned TROP source link, and the Wooldridge review now points to the official article URL in docs/methodology/papers/wooldridge-2023-review.md:L4-L5. No tests are required for this docs-only PR. Concrete fix: none.

Closes the Athey, Imbens, Qu & Viviano (2025) Triply Robust Panel Estimators (arXiv:2508.21536) primary-source review on the methodology tracker. PR-A (paper review on file at docs/methodology/papers/athey- 2025-review.md) was previously merged as igerber#443; this PR is the F.L.I.P. consolidation: new tests/test_methodology_trop.py with paper-equation- numbered Verified Components walk-through (10 classes, 36 tests covering Eq. 2 nuclear-norm prox / FISTA / weighted-prox, Eq. 3 unit + time weights, Eqs. 4-5 + Algorithm 1 LOOCV with two-stage cycling, Corollary 1 three-condition unbiasedness, Theorem 5.1 MC-ranking realisation, Section 2.2 DID + MC reductions, Eq. 13 + Algorithm 2 per-(i, t) estimation, Algorithm 3 stratified pairs bootstrap, Section 3 / Eq. 6 factor-DGP recovery, plus a TestTROPDeviations class locking 11 documented library deviations). Migrated from tests/test_trop.py: TestMethodologyVerification (5 tests -> TestTROPEquation6FactorDGPRecovery), four paper-conformance tests + one weighted-solver convergence test from TestPaperConformanceFixes (distributed across the new equation-numbered classes), three prox / FISTA / weighted-objective tests from TestTROPNuclearNormSolver (-> TestTROPNuclearNormProx), plus a cycling-convergence test from TestCyclingSearch and the factor-DGP smoke from TestTROPvsSDID. The TestPaperConformanceFixes and TestTROPvsSDID shells are deleted; TestTROPNuclearNormSolver retains its single defensive test_zero_weights_no_division_error. METHODOLOGY_REVIEW.md TROP row promoted In Progress -> Complete (paper method="local") with merge date 2026-05-24, full Verified Components / Test Coverage / Deviations / Outstanding Concerns / R Parity structure mirroring HAD (PR igerber#473), ContinuousDiD (PR igerber#476), DCDH (PR igerber#481), WooldridgeDiD (PR igerber#486). The methodology promotion is scoped to the paper-aligned method="local" path (paper Algorithm 2); method="global" is a library-side efficiency adaptation per REGISTRY and stays defensively covered in tests/test_trop.py::TestTROPGlobalMethod. Documented deviations: Gap igerber#5 (unnormalised weights match Eq. 2, not Section 5 sum-to-one) — locked by a direct kernel-weight inspection test against TROP._compute_observation_weights; Gap igerber#9 (control / pre- treatment cell drops supported beyond paper's balanced-panel assumption); rank selection is implicit via nuclear-norm soft-thresholding (no discrete rank_selection constructor parameter — corrects an earlier REGISTRY overclaim that listed cv / ic / elbow methods); lambda_nn=inf -> 1e10 internal sentinel with original-value storage on results. Outstanding Concerns (deferred): Equation 14 covariate extension (TROP.fit() has no `covariates` kwarg; non-support locked by TestTROPDeviations::test_covariates_not_supported via inspect.signature to guard against future **kwargs) and Theorem 8.1 deferred until use cases motivate. SC / SDID reductions paper-claimed under "specific (omega, theta) weight choices" not provided in the paper text; cross- language anchor deferred until paper-author code clarifies the weight map. Eq. 10 direct numerical reconstruction deferred — requires exposing the internal per-(i, t) theta / omega weight vectors. R parity deferred ("forthcoming" per the paper). Methodology sign-off scope: paper-aligned identification ingredients (Eq. 2 prox + Eq. 3 weights + Eqs. 4-5 LOOCV + Algorithms 1-3 + Corollary 1 single-draw sanity checks + Eq. 6 simulation recovery + DID reduction + documented deviations) are directly locked. Theorem 5.1 is verified as a simulation sanity check (TROP RMSE < DID RMSE under LOOCV-tuned weights), NOT a direct fixed-weight conditional-bias-bound lock. The Matrix Completion reduction is verified as code-path activation (effective_rank > 0 + beats DID baseline), NOT equivalence against an independent MC reference. Plain (non-accelerated) prox- gradient objective monotonicity is locked; the shipped accelerated FISTA outer loop does NOT guarantee per-step monotonicity (Nesterov momentum gives O(1/k^2) but not monotonicity) and is not directly tested. REGISTRY.md ## TROP section gains a Verified Components expansion: 10 ticked requirements + four **Note:** / **Note (paper resolution):** / **Note (deferral):** annotations consolidating the deviation surface. No source-code changes to diff_diff/trop*.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label May 15, 2026

igerber merged commit 439c762 into main May 15, 2026
11 of 12 checks passed

igerber deleted the docs/paper-reviews-trop-etwfe branch May 15, 2026 16:17

igerber mentioned this pull request May 25, 2026

trop: methodology-review tracker promotion + test_methodology_trop.py #491

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add retrospective paper reviews for TROP and Wooldridge ETWFE#443

docs: add retrospective paper reviews for TROP and Wooldridge ETWFE#443
igerber merged 8 commits into
mainfrom
docs/paper-reviews-trop-etwfe

igerber commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

igerber commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 15, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 15, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

github-actions Bot commented May 15, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant