Commit 9f3368a
two-stage-did: count vcov n_clusters via np.unique like the variance (codex P2)
n_clusters used Series.nunique() (drops NaN), but the GMM sandwich counts
np.unique(cluster_ids) (keeps a single NaN group). A non-survey cluster= column
with missing IDs would make the reported G undercount the SE's actual cluster
count. Count clusters the same way the variance does — np.unique(df[cluster_var])
— which also consolidates the two non-survey branches and still excludes
always-treated-dropped units (df, not data). Adds a NaN-cluster regression test.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>1 parent 64119a0 commit 9f3368a
2 files changed
Lines changed: 41 additions & 17 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2067 | 2067 | | |
2068 | 2068 | | |
2069 | 2069 | | |
2070 | | - | |
2071 | | - | |
2072 | | - | |
2073 | | - | |
2074 | | - | |
2075 | | - | |
2076 | | - | |
2077 | | - | |
2078 | | - | |
2079 | | - | |
2080 | | - | |
2081 | | - | |
| 2070 | + | |
| 2071 | + | |
| 2072 | + | |
| 2073 | + | |
| 2074 | + | |
| 2075 | + | |
| 2076 | + | |
| 2077 | + | |
| 2078 | + | |
| 2079 | + | |
| 2080 | + | |
| 2081 | + | |
| 2082 | + | |
2082 | 2083 | | |
2083 | 2084 | | |
2084 | 2085 | | |
2085 | | - | |
2086 | | - | |
2087 | | - | |
2088 | 2086 | | |
2089 | | - | |
2090 | | - | |
| 2087 | + | |
| 2088 | + | |
2091 | 2089 | | |
2092 | 2090 | | |
2093 | 2091 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2252 | 2252 | | |
2253 | 2253 | | |
2254 | 2254 | | |
| 2255 | + | |
| 2256 | + | |
| 2257 | + | |
| 2258 | + | |
| 2259 | + | |
| 2260 | + | |
| 2261 | + | |
| 2262 | + | |
| 2263 | + | |
| 2264 | + | |
| 2265 | + | |
| 2266 | + | |
| 2267 | + | |
| 2268 | + | |
| 2269 | + | |
| 2270 | + | |
| 2271 | + | |
| 2272 | + | |
| 2273 | + | |
| 2274 | + | |
| 2275 | + | |
| 2276 | + | |
| 2277 | + | |
| 2278 | + | |
| 2279 | + | |
| 2280 | + | |
0 commit comments