PR #457 R10 polish: case-insensitive R-row selector + refresh golden meta

igerber · igerber · commit a86498ee744c · 2026-05-16T16:01:08.000-04:00
Fresh R10 verdict was Looks good with 2 P3 informational items: 1. P3 (Maintainability): the always-treated fold-back test selected R rows via case-sensitive literal substrings ("Untreated", "Always Treated", "Later"), while the neighboring _classify_r_type classifier uses case-insensitive semantic matching. Made the selector consistent — case-insensitive matching on "untreated" / "never" / "always" tokens, so the fold-back survives bacondecomp label variation across versions. 2. P3 (Documentation/Tests): committed golden JSON's meta.description still advertised full per-component (treated, control, type) tuple parity as the contract, but PR #457 intentionally replaces that for the always_treated_remapped U-bucket rows with aggregate + fold-back parity. Updated meta.description to describe the actual three-tier contract (aggregate / direct per-component on non-remap + 6 timing-vs-timing rows / cohort fold-back for U bucket) with a pointer to the REGISTRY Notes that document the convention divergence. Tests: 34/34 still pass.
diff --git a/benchmarks/data/r_bacondecomp_golden.json b/benchmarks/data/r_bacondecomp_golden.json
@@ -3,7 +3,7 @@
     "generated_at": "2026-05-16",
     "bacondecomp_version": "0.1.1",
     "r_version": "R version 4.5.2 (2025-10-31)",
-    "description": "Goodman-Bacon (2021) decomposition parity goldens for diff-diff BaconDecomposition. Parity target: atol=1e-6 on per-component (treated, control, type) tuples plus the TWFE coefficient."
+    "description": "Goodman-Bacon (2021) decomposition parity goldens for diff-diff BaconDecomposition. Parity target at atol=1e-6: (1) aggregate TWFE coefficient + weights-sum across all 3 fixtures; (2) direct per-component (treated, control, type) parity on the 2 non-remap fixtures AND on the 6 timing-vs-timing rows of always_treated_remapped; (3) cohort-level fold-back parity for the U bucket on always_treated_remapped (Python's paper-footnote-11 remap folds R's separate Later-vs-Always-Treated + Treated-vs-Untreated rows into a single treated_vs_never cell per cohort; aggregate is invariant per Theorem 1, breakdown differs by convention). See REGISTRY Note (R parity convention divergence on always-treated) + Deviation (first-period boundary extension)."
   },
   "uniform_3groups_with_never_treated": {
     "panel": {
diff --git a/tests/test_methodology_bacon.py b/tests/test_methodology_bacon.py
@@ -519,11 +519,16 @@ def test_always_treated_remapped_fold_back_matches_r(self, golden) -> None:
         }
         # Aggregate R's two U-bucket types per treated cohort.
         # R uses ctrl=99999 for untreated and ctrl=1 (the always-treated cohort)
-        # for the `Later vs Always Treated` rows.
+        # for the `Later vs Always Treated` rows. Match on case-insensitive
+        # semantic tokens so the selector survives `bacondecomp` label
+        # variation across versions (same convention as the neighboring
+        # ``_classify_r_type`` helper used by the per-component test).
         r_agg: dict = {}
         for c in fix["r_components"]:
-            ctype = c.get("type", "")
-            if "Untreated" in ctype or ("Always Treated" in ctype and "Later" in ctype):
+            tlow = (c.get("type") or "").lower()
+            is_untreated = "untreated" in tlow or "never" in tlow
+            is_always_treated_compare = "always" in tlow
+            if is_untreated or is_always_treated_compare:
                 k = float(c["treated_group"])
                 w = float(c["weight"])
                 e = float(c["estimate"])