Update Tutorial 02: Add pre-trends section and fix DGP usage

igerber · claude · igerber · commit 789b128b67c8 · 2026-01-22T12:30:31.000-05:00
- Add Section 8 demonstrating base_period parameter for pre-treatment effects
- Show varying vs universal base period comparison
- Include interpretation guidance for parallel trends diagnostics
- Fix DGP usage: remove redundant cohort column, use first_treat directly
- Update all CS/SA/Bacon fit() calls from first_treat='cohort' to first_treat='first_treat'
- Renumber sections 8-13 to 9-14
- Update TOC and summary with new pre-trends content

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;
diff --git a/docs/tutorials/02_staggered_did.ipynb b/docs/tutorials/02_staggered_did.ipynb
@@ -20,8 +20,13 @@
     "5. Aggregating effects (simple, group, event-study)\n",
     "6. Bootstrap inference for valid standard errors\n",
     "7. Visualization\n",
-    "8. **Sun-Abraham interaction-weighted estimator**\n",
-    "9. **Comparing CS and SA as a robustness check**"
+    "8. **Pre-treatment effects and parallel trends testing**\n",
+    "9. Different control group options\n",
+    "10. Handling anticipation effects\n",
+    "11. Adding covariates\n",
+    "12. Comparing with MultiPeriodDiD\n",
+    "13. Sun-Abraham interaction-weighted estimator\n",
+    "14. Comparing CS and SA as a robustness check"
    ]
   },
   {
@@ -78,8 +83,7 @@
     "    seed=42\n",
     ")\n",
     "\n",
-    "# Add a 'cohort' column that matches the old format (first_treat is already there)\n",
-    "df['cohort'] = df['first_treat']\n",
+    "# The DGP returns 'first_treat' column: 0 = never-treated, >0 = first treatment period\n",
     "\n",
     "print(f\"Dataset: {len(df)} observations, {df['unit'].nunique()} units, {df['period'].nunique()} periods\")\n",
     "df.head(10)"
@@ -92,9 +96,9 @@
    "outputs": [],
    "source": [
     "# Examine treatment timing\n",
-    "cohort_summary = df.groupby('unit').agg({'cohort': 'first', 'treated': 'sum'}).reset_index()\n",
+    "cohort_summary = df.groupby('unit').agg({'first_treat': 'first', 'treated': 'sum'}).reset_index()\n",
     "print(\"Treatment cohorts:\")\n",
-    "print(cohort_summary.groupby('cohort').size())\n",
+    "print(cohort_summary.groupby('first_treat').size())\n",
     "\n",
     "print(\"\\nTreatment adoption over time:\")\n",
     "print(df.groupby('period')['treated'].mean().round(3))"
@@ -165,7 +169,7 @@
     "    outcome='outcome',\n",
     "    unit='unit',\n",
     "    time='period',\n",
-    "    first_treat='cohort'  # Same as 'cohort' column - 0 means never-treated\n",
+    "    first_treat='first_treat'  # 0 means never-treated\n",
     ")\n",
     "\n",
     "# View the decomposition summary\n",
@@ -227,7 +231,7 @@
     "    outcome=\"outcome\",\n",
     "    unit=\"unit\",\n",
     "    time=\"period\",\n",
-    "    first_treat=\"cohort\",  # Column with first treatment period (0 = never treated)\n",
+    "    first_treat=\"first_treat\",  # Column with first treatment period (0 = never treated)\n",
     "    aggregate=\"all\"  # Compute all aggregations (simple, event_study, group)\n",
     ")\n",
     "\n",
@@ -352,7 +356,33 @@
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "# Callaway-Sant'Anna with bootstrap inference\ncs_boot = CallawaySantAnna(\n    control_group=\"never_treated\",\n    n_bootstrap=499,              # Number of bootstrap iterations\n    bootstrap_weights='rademacher',  # or 'mammen', 'webb'\n    seed=42                        # For reproducibility\n)\n\nresults_boot = cs_boot.fit(\n    df,\n    outcome=\"outcome\",\n    unit=\"unit\",\n    time=\"period\",\n    first_treat=\"cohort\",         # Column with first treatment period\n    aggregate=\"event_study\"       # Compute event study aggregation\n)\n\n# Access bootstrap results\nprint(\"Bootstrap Inference Results:\")\nprint(\"=\" * 60)\nprint(f\"\\nOverall ATT: {results_boot.overall_att:.4f}\")\nprint(f\"Bootstrap SE: {results_boot.bootstrap_results.overall_att_se:.4f}\")\nprint(f\"Bootstrap 95% CI: [{results_boot.bootstrap_results.overall_att_ci[0]:.4f}, \"\n      f\"{results_boot.bootstrap_results.overall_att_ci[1]:.4f}]\")\nprint(f\"Bootstrap p-value: {results_boot.bootstrap_results.overall_att_p_value:.4f}\")"
+   "source": [
+    "# Callaway-Sant'Anna with bootstrap inference\n",
+    "cs_boot = CallawaySantAnna(\n",
+    "    control_group=\"never_treated\",\n",
+    "    n_bootstrap=499,              # Number of bootstrap iterations\n",
+    "    bootstrap_weights='rademacher',  # or 'mammen', 'webb'\n",
+    "    seed=42                        # For reproducibility\n",
+    ")\n",
+    "\n",
+    "results_boot = cs_boot.fit(\n",
+    "    df,\n",
+    "    outcome=\"outcome\",\n",
+    "    unit=\"unit\",\n",
+    "    time=\"period\",\n",
+    "    first_treat=\"first_treat\",    # Column with first treatment period\n",
+    "    aggregate=\"event_study\"       # Compute event study aggregation\n",
+    ")\n",
+    "\n",
+    "# Access bootstrap results\n",
+    "print(\"Bootstrap Inference Results:\")\n",
+    "print(\"=\" * 60)\n",
+    "print(f\"\\nOverall ATT: {results_boot.overall_att:.4f}\")\n",
+    "print(f\"Bootstrap SE: {results_boot.bootstrap_results.overall_att_se:.4f}\")\n",
+    "print(f\"Bootstrap 95% CI: [{results_boot.bootstrap_results.overall_att_ci[0]:.4f}, \"\n",
+    "      f\"{results_boot.bootstrap_results.overall_att_ci[1]:.4f}]\")\n",
+    "print(f\"Bootstrap p-value: {results_boot.bootstrap_results.overall_att_p_value:.4f}\")"
+   ]
   },
   {
    "cell_type": "code",
@@ -431,7 +461,148 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 8. Different Control Group Options\n",
+    "## 8. Pre-Treatment Effects and Parallel Trends Testing\n",
+    "\n",
+    "The Callaway-Sant'Anna estimator can compute **pre-treatment effects** ATT(g,t) for periods before treatment. These should be near zero if parallel trends holds.\n",
+    "\n",
+    "The `base_period` parameter controls how the reference period is selected:\n",
+    "- `\"varying\"` (default): For pre-treatment periods, compares t to t-1 (consecutive comparisons)\n",
+    "- `\"universal\"`: Always compares to g-1 (the period just before treatment)\n",
+    "\n",
+    "Both produce identical post-treatment effects; they differ only for pre-treatment diagnostics."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# CallawaySantAnna with explicit base_period for pre-treatment effects\n",
+    "cs_pretrends = CallawaySantAnna(\n",
+    "    control_group=\"never_treated\",\n",
+    "    base_period=\"varying\"  # Default: consecutive comparisons for pre-periods\n",
+    ")\n",
+    "\n",
+    "results_pretrends = cs_pretrends.fit(\n",
+    "    df,\n",
+    "    outcome=\"outcome\",\n",
+    "    unit=\"unit\",\n",
+    "    time=\"period\",\n",
+    "    first_treat=\"first_treat\",\n",
+    "    aggregate=\"event_study\"\n",
+    ")\n",
+    "\n",
+    "# The base_period is recorded in results\n",
+    "print(f\"Base period method: {results_pretrends.base_period}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Examine pre-treatment effects (event time < 0)\n",
+    "print(\"Pre-Treatment Effects (Parallel Trends Diagnostic):\")\n",
+    "print(\"=\" * 65)\n",
+    "print(f\"{'Event Time':>12} {'ATT':>10} {'SE':>10} {'95% CI':>25} {'Test'}\")\n",
+    "print(\"-\" * 65)\n",
+    "\n",
+    "pre_period_effects = []\n",
+    "for event_time in sorted(results_pretrends.event_study_effects.keys()):\n",
+    "    if event_time < 0:\n",
+    "        effects = results_pretrends.event_study_effects[event_time]\n",
+    "        ci = effects['conf_int']\n",
+    "        includes_zero = ci[0] <= 0 <= ci[1]\n",
+    "        marker = \"Pass\" if includes_zero else \"Fail\"\n",
+    "        pre_period_effects.append(effects['effect'])\n",
+    "        print(f\"{event_time:>12} {effects['effect']:>10.4f} {effects['se']:>10.4f} \"\n",
+    "              f\"[{ci[0]:>8.4f}, {ci[1]:>8.4f}] {marker}\")\n",
+    "\n",
+    "if pre_period_effects:\n",
+    "    print(f\"\\n-> All pre-treatment effects should be close to zero\")\n",
+    "    print(f\"   Mean pre-treatment effect: {np.mean(pre_period_effects):.4f}\")\n",
+    "else:\n",
+    "    print(\"No pre-treatment effects computed (insufficient pre-periods)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Comparing Base Period Methods\n",
+    "\n",
+    "Let's compare the two base period methods to understand their difference:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Compare varying vs universal base period\n",
+    "cs_universal = CallawaySantAnna(\n",
+    "    control_group=\"never_treated\",\n",
+    "    base_period=\"universal\"  # Always use g-1 as base\n",
+    ")\n",
+    "\n",
+    "results_universal = cs_universal.fit(\n",
+    "    df,\n",
+    "    outcome=\"outcome\",\n",
+    "    unit=\"unit\",\n",
+    "    time=\"period\",\n",
+    "    first_treat=\"first_treat\",\n",
+    "    aggregate=\"event_study\"\n",
+    ")\n",
+    "\n",
+    "print(\"Pre-Treatment Effects: Varying vs Universal Base Period\")\n",
+    "print(\"=\" * 70)\n",
+    "print(f\"{'Event Time':>12} {'Varying':>12} {'Universal':>12} {'Difference':>12}\")\n",
+    "print(\"-\" * 70)\n",
+    "\n",
+    "for event_time in sorted(results_pretrends.event_study_effects.keys()):\n",
+    "    if event_time < 0:\n",
+    "        varying_eff = results_pretrends.event_study_effects[event_time]['effect']\n",
+    "        universal_eff = results_universal.event_study_effects.get(event_time, {}).get('effect', np.nan)\n",
+    "        diff = varying_eff - universal_eff if not np.isnan(universal_eff) else np.nan\n",
+    "        print(f\"{event_time:>12} {varying_eff:>12.4f} {universal_eff:>12.4f} {diff:>12.4f}\")\n",
+    "\n",
+    "print(\"\\nNote: 'Varying' uses consecutive period comparisons (t vs t-1)\")\n",
+    "print(\"      'Universal' compares all periods to g-1\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Interpreting Pre-Treatment Effects\n",
+    "\n",
+    "**What we're testing:**\n",
+    "- Pre-treatment ATT(g,t) should be approximately zero if parallel trends holds\n",
+    "- Significant non-zero pre-treatment effects suggest potential parallel trends violations\n",
+    "\n",
+    "**Key insights:**\n",
+    "- Visual inspection in the event study plot shows pre-period coefficients\n",
+    "- Formal tests: 95% CIs including zero is consistent with parallel trends\n",
+    "- **Important caveat**: A \"passing\" test doesn't prove parallel trends—the test may lack power\n",
+    "\n",
+    "**When concerned about pre-trends:**\n",
+    "- Add covariates for precision (Section 11)\n",
+    "- Use `control_group=\"not_yet_treated\"` for more data (Section 9)\n",
+    "- Apply Honest DiD sensitivity analysis to bound effects under violations (Tutorial 05)\n",
+    "- Assess pre-trends test power using Tutorial 07\n",
+    "\n",
+    "For comprehensive parallel trends testing: **Tutorial 04**\n",
+    "For pre-trends power analysis (Roth 2022): **Tutorial 07**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 9. Different Control Group Options\n",
     "\n",
     "The CS estimator supports different control group specifications:\n",
     "- `\"never_treated\"`: Only use units that are never treated\n",
@@ -454,7 +625,7 @@
     "    outcome=\"outcome\",\n",
     "    unit=\"unit\",\n",
     "    time=\"period\",\n",
-    "    first_treat=\"cohort\"\n",
+    "    first_treat=\"first_treat\"\n",
     ")\n",
     "\n",
     "# Compare using overall_att/overall_se attributes\n",
@@ -469,7 +640,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 9. Handling Anticipation Effects\n",
+    "## 10. Handling Anticipation Effects\n",
     "\n",
     "If units start changing behavior before official treatment (anticipation), you can specify the anticipation period."
    ]
@@ -491,7 +662,7 @@
     "    outcome=\"outcome\",\n",
     "    unit=\"unit\",\n",
     "    time=\"period\",\n",
-    "    first_treat=\"cohort\"\n",
+    "    first_treat=\"first_treat\"\n",
     ")\n",
     "\n",
     "print(f\"With anticipation=1: ATT = {results_antic.overall_att:.4f}\")"
@@ -501,7 +672,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 10. Adding Covariates\n",
+    "## 11. Adding Covariates\n",
     "\n",
     "You can include covariates to improve precision through outcome regression or propensity score methods."
    ]
@@ -526,7 +697,7 @@
     "    outcome=\"outcome\",\n",
     "    unit=\"unit\",\n",
     "    time=\"period\",\n",
-    "    first_treat=\"cohort\",\n",
+    "    first_treat=\"first_treat\",\n",
     "    covariates=[\"size\", \"age\"]\n",
     ")\n",
     "\n",
@@ -537,7 +708,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 11. Comparing with MultiPeriodDiD\n",
+    "## 12. Comparing with MultiPeriodDiD\n",
     "\n",
     "For comparison, here's how you would use `MultiPeriodDiD` which estimates period-specific effects. \n",
     "\n",
@@ -597,7 +768,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 12. Sun-Abraham Interaction-Weighted Estimator\n",
+    "## 13. Sun-Abraham Interaction-Weighted Estimator\n",
     "\n",
     "The Sun-Abraham (2021) estimator provides an alternative approach to staggered DiD. While Callaway-Sant'Anna aggregates 2x2 DiD comparisons, Sun-Abraham uses an **interaction-weighted regression** approach:\n",
     "\n",
@@ -630,7 +801,7 @@
     "    outcome=\"outcome\",\n",
     "    unit=\"unit\",\n",
     "    time=\"period\",\n",
-    "    first_treat=\"cohort\"  # Column with first treatment period (0 = never treated)\n",
+    "    first_treat=\"first_treat\"  # Column with first treatment period (0 = never treated)\n",
     ")\n",
     "\n",
     "# View summary\n",
@@ -664,7 +835,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 13. Comparing CS and SA as a Robustness Check\n",
+    "## 14. Comparing CS and SA as a Robustness Check\n",
     "\n",
     "Running both estimators provides a useful robustness check. When they agree, results are more credible."
    ]
@@ -716,7 +887,7 @@
     "   - Only using valid comparison groups\n",
     "   - Properly aggregating effects\n",
     "4. **Sun-Abraham** provides an alternative approach using:\n",
-    "   - Interaction-weighted regression with cohort × relative-time indicators\n",
+    "   - Interaction-weighted regression with cohort x relative-time indicators\n",
     "   - Different weighting scheme than CS\n",
     "   - More efficient under homogeneous effects\n",
     "5. **Run both CS and SA** as a robustness check—when they agree, results are more credible\n",
@@ -728,7 +899,12 @@
     "   - Use `n_bootstrap` parameter to enable multiplier bootstrap\n",
     "   - Choose weight type: `'rademacher'`, `'mammen'`, or `'webb'`\n",
     "   - Bootstrap results include SEs, CIs, and p-values for all aggregations\n",
-    "8. **Control group choices** affect efficiency and assumptions:\n",
+    "8. **Pre-treatment effects** provide parallel trends diagnostics:\n",
+    "   - Use `base_period=\"varying\"` for consecutive period comparisons\n",
+    "   - Pre-treatment ATT(g,t) should be near zero\n",
+    "   - 95% CIs including zero is consistent with parallel trends\n",
+    "   - See Tutorial 07 for pre-trends power analysis (Roth 2022)\n",
+    "9. **Control group choices** affect efficiency and assumptions:\n",
     "   - `\"never_treated\"`: Stronger parallel trends assumption\n",
     "   - `\"not_yet_treated\"`: Weaker assumption, uses more data\n",
     "\n",
@@ -746,4 +922,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 4
-}
+}