Skip to content

Commit 780d502

Browse files
committed
PR #457 R3 polish: assert R→Python U-bucket fold-back on always-treated
R3 verdict was Looks good with 1 P3 informational item. The per-component parity test skips the `always_treated_remapped` fixture (R/Python decompose the U bucket differently by convention), and the REGISTRY note documents that aggregating R's `Later vs Always Treated` + `Treated vs Untreated` rows by treated cohort should match Python's single `treated_vs_never` component for that cohort. The reviewer flagged that the documented structural claim was not directly asserted in tests — a cohort-level regression in the fold-back could slip through under overall TWFE parity. Per memory `feedback_test_coverage_gap_treat_as_actionable`, the "test exists but doesn't directly exercise the documented surface" P3 is actionable. Added `test_always_treated_remapped_fold_back_matches_r` to `TestBaconParityR`: for each treated cohort in the remap fixture, aggregate R's `Later vs Always Treated` + `Treated vs Untreated` rows by combined weight and weight-averaged estimate, then assert both match Python's `treated_vs_never` component for that cohort at atol=1e-6. Currently passes — confirms the documented structural fold-back is exact at numerical precision. Tests: 34/34 pass in test_methodology_bacon.py (was 33; +1 new regression).
1 parent 86facdd commit 780d502

1 file changed

Lines changed: 68 additions & 0 deletions

File tree

tests/test_methodology_bacon.py

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -465,6 +465,74 @@ def _classify_r_type(c: dict, fixture_name: str) -> str:
465465
f"{fixture_name} {k}: weight Python={py_weights[k]} " f"vs R={r_weights[k]}"
466466
)
467467

468+
def test_always_treated_remapped_fold_back_matches_r(self, golden) -> None:
469+
"""Pin the documented R→Python fold-back for the always-treated U bucket.
470+
471+
The per-component test above skips ``always_treated_remapped`` because
472+
R and Python decompose the U bucket differently — but the documented
473+
REGISTRY claim is that **aggregating** R's `Later vs Always Treated`
474+
+ `Treated vs Untreated` rows by treated cohort matches Python's
475+
single `treated_vs_never` cell for that cohort. Assert that fold-back
476+
directly so a cohort-level regression can't slip through under
477+
overall TWFE parity.
478+
479+
For each treated cohort k:
480+
- R: combined weight w_R = w(k vs always-treated) + w(k vs untreated)
481+
and weight-weighted estimate e_R = Σ w_i * e_i / w_R
482+
- Python: single treated_vs_never component (w_Py, e_Py)
483+
- Assert |w_Py - w_R| < 1e-6 AND |e_Py - e_R| < 1e-6.
484+
"""
485+
if "always_treated_remapped" not in golden:
486+
pytest.skip("always_treated_remapped fixture not in goldens")
487+
fix = golden["always_treated_remapped"]
488+
panel = pd.DataFrame(fix["panel"])
489+
with warnings.catch_warnings():
490+
warnings.simplefilter("ignore", category=UserWarning)
491+
results = bacon_decompose(
492+
panel,
493+
outcome="y",
494+
unit="unit",
495+
time="time",
496+
first_treat="first_treat",
497+
weights="exact",
498+
)
499+
# Build Python's treated_vs_never lookup: cohort -> (weight, estimate)
500+
py_tvn = {
501+
float(c.treated_group): (c.weight, c.estimate)
502+
for c in results.comparisons
503+
if c.comparison_type == "treated_vs_never"
504+
}
505+
# Aggregate R's two U-bucket types per treated cohort.
506+
# R uses ctrl=99999 for untreated and ctrl=1 (the always-treated cohort)
507+
# for the `Later vs Always Treated` rows.
508+
r_agg: dict = {}
509+
for c in fix["r_components"]:
510+
ctype = c.get("type", "")
511+
if "Untreated" in ctype or ("Always Treated" in ctype and "Later" in ctype):
512+
k = float(c["treated_group"])
513+
w = float(c["weight"])
514+
e = float(c["estimate"])
515+
if k not in r_agg:
516+
r_agg[k] = [0.0, 0.0] # [sum_w, sum_w_e]
517+
r_agg[k][0] += w
518+
r_agg[k][1] += w * e
519+
# Cohorts must match
520+
assert set(py_tvn.keys()) == set(r_agg.keys()), (
521+
f"always_treated_remapped: treated_vs_never cohorts differ. "
522+
f"Python: {sorted(py_tvn)}, R-aggregated: {sorted(r_agg)}"
523+
)
524+
for k, (py_w, py_e) in py_tvn.items():
525+
r_w, r_we = r_agg[k]
526+
r_e = r_we / r_w
527+
assert abs(py_w - r_w) < 1e-6, (
528+
f"always_treated_remapped cohort={k}: combined weight "
529+
f"Python={py_w:.10f} vs R-aggregated={r_w:.10f}"
530+
)
531+
assert abs(py_e - r_e) < 1e-6, (
532+
f"always_treated_remapped cohort={k}: weight-averaged estimate "
533+
f"Python={py_e:.10f} vs R-aggregated={r_e:.10f}"
534+
)
535+
468536

469537
# ---------------------------------------------------------------------------
470538
# 3. Always-treated warn+remap

0 commit comments

Comments
 (0)