regression_monkey

Author: zhao_xun@sjtu.edu.cn · License: MIT

Dedicated to Haoping Yu, the one I love most.

A standalone specification curve analysis tool for econometric robustness checks. It enumerates all valid combinations of control variables, estimates OLS with N-way absorbed fixed effects, computes heteroskedasticity-robust, one-way clustered, or CGM two-way clustered standard errors, and exports publication-ready figures and a significance summary table.

The workflow and output design draw on Stata's spec_curve command and extend it with batch configuration, parallel execution, and two export formats.

Output modes

PNG mode — static publication-quality figure exported as a raster image.

HTML mode — a self-contained interactive webpage with two view modes. It opens in compact mode by default only when there are more than 1024 specifications; smaller charts disable compact and open in detail. Switch Y, X, and fixed-effect spec from a top selector bar when multiple combinations are present, and sort specifications by coefficient, observation count, or signed significance.

detail is the full interactive view. Hover highlights a specification across all panels, click pins the current specification, and the right details panel lists included controls in input order with per-control statistics. COEF confidence bands are drawn as per-specification slices for a finer PNG-like edge texture; STARS uses COEF-style round points stacked from bottom to top.

When switching Y, X, or fixed-effect spec in the top selector bar, the other selectors keep their current value when that value is still available; fixed-effect specs are preserved by matching the displayed spec label across Y/X combinations. In-chart SORT, MODE, and Controls color selections also persist across top-selector switches when the target chart supports the selected mode, without flashing the default toolbar state during iframe reload.

compact is the dense overview view for large specification sets. It hides the right details panel, disables central-chart hover/click selection, caps the column width at the 8192-specification baseline, and allows horizontal scrolling when the compact plot is wider than the viewport. After rendering, the compact chart is cached as one bitmap and scrolling redraws only the visible slice; sorting, filter chips, guide chips, CI chips, and viewport resizing rebuild that cache. STARS uses segmented CONTROL-like color bars whose heights encode significance level, CONTROL and OBS use thin square bars, and compact COEF points scale to the compact baseline column width.

Both PNG and HTML title metadata show the analysis engine (stata or r) used for the exported figure.

For external engines, elapsed-time metadata includes the engine handoff/setup step that belongs to the run, the model-estimation time, and the export rendering time reported in progress output.

Interactive HTML uses the persisted coefficient-axis scale when re-sorting specifications, so CI bands keep the same visual length implied by the saved ci90/ci95/ci99 values.

The HTML significance legend uses the same colors as the coefficient points; n.s. is black.

HTML TEST GUIDES include all, no, and best. In DETAIL, guide lines cannot be hidden; clicking a guide chip jumps to the next matching specification and cycles when multiple matches exist. In COMPACT, guide chips keep the old show/hide behavior. best is a yellow helper line recomputed for the current filtered view: when the currently visible 3-star significant specifications contain no full-test specification, it marks the visible 3-star specification with the largest number of included controls_test variables, breaking ties by smaller p-value and then original order. The HTML caches these dynamic best lookups in page memory by filter state; the cache is not persisted and is cleared when the page closes.

In DETAIL, the right-side control-coefficient badge text is white for significant controls except +1 and -1, which use black text. Significant badge backgrounds use the same 1/2/3 red-blue color levels as STARS. 0+ and 0- use a light gray background with deep red/deep blue text.

The right-side detail panel always shows content in DETAIL mode, even before any specification is hovered or pinned. Its width is computed from the configured control-variable names and updates with the viewport so variable names display fully without hover tooltips. It always shows all controls_test variables in the original input order. Variables not included in the current specification keep their name and a blank badge box, while coefficient and p-value cells stay empty.

The right-side detail panel header includes a COPY button between the specification index and significance badge. It copies the current specification's included control variable names, space-separated, in the same order as the sidebar list.

In the right-side detail panel, clicking a control-variable row cycles its filter through significant included controls, non-significant included controls, not-controlled specifications, and off. Multiple selected rows are combined with AND semantics, so the main chart removes non-matching specifications and re-packs the remaining columns so the kept specifications are contiguous. Filter state is shown on the control rows themselves, with a small legend above the control coefficients for sig, no-sig, and no-control; CLEAR uses the same button style as COPY in the panel header.

The HTML Controls color switch defaults to gray, continuous gray run-length shading for the control matrix. gray draws only the regular varying matrix controls and does not reserve blank rows for controls_must. Switching to sig colors each included control cell by that control's own coefficient significance, using the same red/blue signed levels as STARS and the right sidebar, with a darker neutral gray for insignificant or missing stats. In sig, all controls_must variables are also drawn in the control matrix, ordered before the regular varying matrix controls, and their row labels show a small leading dot to distinguish them from test controls.

Requirements

Dependencies are declared in pyproject.toml and managed by uv.

uv sync

Core dependencies: numpy, pandas, matplotlib, pyreadstat.

Quick start

The recommended workflow is config-file driven. Copy config/config.example.toml, fill in your variable names and file path, then run:

uv run regression-monkey

Or point to the config explicitly:

uv run regression-monkey config/config.toml

CLI flags override any TOML value:

uv run regression-monkey config/config.toml --dpi 600 --n-jobs 0

Export format is controlled by --export-format:

# PNG only (default)
uv run regression-monkey config/config.toml --export-format png

# Interactive HTML only
uv run regression-monkey config/config.toml --export-format html

# Both
uv run regression-monkey config/config.toml --export-format both

Configuration file

data = "path/to/data.dta"
y = ["MPATT"]
x = ["ln_info", "ln_quant", "ln_qual"]
controls_test = ["SOE", "Big4", ["ListAge1", "FirmAge1"]]
controls_must = ["Lev", "Size", ["ROA", "ROE"]]
moderators = []  # R only; each z must already be a control variable
mediators = []   # R only; each m must be an existing numeric column and must not be a control variable

output = "outputs"
export_format = "png"   # png | html | both
dpi = 300
fig_width = 14
n_jobs = 0              # 0 = auto-parallel (up to 9 cores)
engine = "r"            # r | stata

Firm_FE   = "code"
Ind_FE    = "ind"
Time_FE   = "year"
Region_FE = "pref"

no_absorb_vce_robust                    = false
no_absorb_vce_cluster_firm              = false
absorb_firm_time_vce_cluster_firm    = true
absorb_firm_indtime_vce_cluster_firm = true
absorb_ind_time_vce_cluster_firm     = true

A complete template is at config/config.example.toml.

The refactored package keeps compatibility aliases at package root, so older imports such as from regression_monkey import py, stata, html still resolve to the new engine/ and plot/ modules.

Input data

Supported formats: .dta, .csv, .parquet, .pq.

Multiple y and x values are accepted; all y × x combinations are processed in one run.

Control variable structure

Both controls_test and controls_must support a mixed structure in TOML or the Python API:

Syntax	Meaning
`"var"`	single variable
`"var1 var2"`	two flat variables, same as listing them separately
`["A", "B"]`	mutually exclusive alternative group

Semantics differ:

controls_test — each plain entry is optional (include or not). An alternative group means at most one of its members is included. Adds a factor of (group_size + 1) to the total spec count.
controls_must — each plain entry is always included. An alternative group means exactly one of its members is included. Adds a factor of group_size.

A variable that appears in both lists causes an immediate error.

Predefined fixed-effect specs

The auto mode selects from a catalog of predefined no-FE / FE + VCE combinations. Terminal output shows the Stata-style noabsorb vce(...) or absorb(...) vce(...) form followed by an explicit standard-error label such as Std.Err.=Robust or Std.Err.=cluster(firm); internal keys use underscores.

TOML key	Stata equivalent
`no_absorb_vce_cluster_firm`	`noabsorb vce(cluster firm)`
`no_absorb_vce_robust`	`noabsorb vce(robust)`
`absorb_firm_time_vce_cluster_firm`	`absorb(firm year) vce(cluster firm)`
`absorb_firm_indtime_vce_cluster_firm`	`absorb(firm i.ind#i.year) vce(cluster firm)`
`absorb_firm_regiontime_vce_cluster_firm`	`absorb(firm i.region#i.year) vce(cluster firm)`
`absorb_firm_indtime_regiontime_vce_cluster_firm`	`absorb(firm i.ind#i.year i.region#i.year) vce(cluster firm)`
`absorb_firm_time_vce_cluster_region`	`absorb(firm year) vce(cluster region)`
`absorb_firm_time_vce_cluster_ind`	`absorb(firm year) vce(cluster ind)`
`absorb_firm_time_vce_cluster_firm_time`	`absorb(firm year) vce(cluster firm year)`
`absorb_ind_region_time_vce_cluster_ind`	`absorb(ind region year) vce(cluster ind)`
`absorb_ind_time_vce_cluster_firm`	`absorb(ind year) vce(cluster firm)`
`absorb_firm_time_vce_robust`	`absorb(firm year) vce(robust)`
(and robust variants of the above)

The user-facing estimation engines are R/fixest and Stata/reghdfe. The old Python estimation engine and manual --fe mode are no longer supported by the main CLI.

Output structure

Each run creates a timestamped subdirectory:

outputs/20260414_174122/
  config_snapshot.toml
  sig.csv
  ab_firm_time_cl_firm/
    MPATT_ln_info_ab_firm_time_cl_firm.png
  ab_firm_indtime_cl_firm/
    MPATT_ln_info_ab_firm_indtime_cl_firm.png
  interactive.html          # only in html / both mode

sig.csv lists all significant specifications across the entire run, sorted by p_value ascending. Fields: Star, coef, p_value, t_value, obs, Y, X, Controls, FE, cluster, Specs, and grouping columns when applicable.

Star values: +3/+2/+1 for positive coefficients significant at 99%/95%/90%; -3/-2/-1 for negative.

Stata engine

Switch to reghdfe by setting engine = "stata" in TOML or passing --engine stata:

uv run regression-monkey config/config.toml --engine stata

The Stata engine also supports subgroup heterogeneity analysis via grouping_variable_by_ind_time, grouping_variable_by_time, or grouping_variable_by_none. For each grouping variable, one extended figure is produced showing the main coefficient curve, b_z=0/1 subgroup curves, and the c.x#c.z interaction coefficient — all in the same PNG. The legacy key grouping_variable remains as an alias for grouping_variable_by_ind_time.

R engine

Switch to the R/fixest implementation by setting engine = "r" in TOML or passing --engine r:

uv run regression-monkey config/config.toml --engine r

The R engine lives in src/regression_monkey/engine/r.py. It writes a narrow handoff dataset containing only the variables needed by the enabled catalog specs, generates one temporary .R script per Y × X × spec, runs it with Rscript, then reads the standard *_results.csv / *_plot_meta.json contract used by the PNG and HTML exporters.

R mode accepts moderators = ["z1", "z2"] and mediators = ["m1", "m2"] in TOML, or --moderators z1 z2 --mediators m1 m2 on the CLI. Each moderator z must already appear in controls_must or controls_test; otherwise the run fails before estimation. Each mediator m must be an existing numeric column in the data and must not appear in controls_must or controls_test; any overlap raises an error before estimation. The main COEF curve remains the original regression without moderator or mediator terms. For specifications that include moderator z, R additionally estimates an augmented regression with z and x*z, stores the three statistics x*z, z, and x in the standard *_results.csv, and HTML renders them below COEF in a MODERATORS panel. Significant moderator rows are exported per variable as sig_moderator_<z>.csv. For every specification, R also estimates one reference regression per mediator m that replaces y with m. The x coefficient statistics are stored in *_results.csv, HTML renders them in a MEDIATORS panel below MODERATORS, and significant rows are exported as sig_mediator_<m>.csv. These checks do not change the number of enumerated control specifications.

R mode is accuracy-first. Each specification keeps the same effective sample it would have under per-spec fixest::feols; specifications with identical effective samples share a cached fixest::demean() call, and the absorbed matrix is then reused with lm.fit(). Robust and one-way clustered SE are computed on that absorbed matrix with fixest-matched absorbed-FE degree-of-freedom accounting. Multi-way clustered specs fall back to exact per-spec feols because fixest may apply non-positive-definite VCOV repairs that should not be approximated. Large-sample cache keys are stored as ordinary string values, not R environment variable names, so runs with tens of thousands of rows do not hit R's variable-name length limit.

The effective-sample cache optimization is documented in assets/R引擎样本缓存优化.md.

Configure the R environment

R mode requires a working Rscript and the R package fixest.

# macOS with Homebrew, if R is not installed yet
brew install r

# Install the required R package and optional speed-up packages
Rscript -e 'install.packages(c("fixest", "data.table", "arrow"), repos = "https://cloud.r-project.org")'

# Verify that Regression Monkey can find Rscript and fixest
Rscript -e 'cat(R.version.string, "\n"); stopifnot(requireNamespace("fixest", quietly = TRUE))'

arrow is optional but recommended: when available, the Python side writes the R handoff data as Feather; otherwise it falls back to CSV. data.table is also optional and speeds up CSV reads when Feather is unavailable.

If Rscript is not on PATH, point Regression Monkey at the exact executable:

uv run regression-monkey config/config.toml --engine r --rscript-path /path/to/Rscript

or set it in TOML:

engine = "r"
rscript_path = "/path/to/Rscript"

R mode currently supports catalog auto specs only, so enable at least one spec flag such as no_absorb_vce_robust = true or absorb_firm_time_vce_cluster_firm = true. Subgroup grouping_variable_* plots remain Stata-only. moderators and mediators are R-only. n_jobs = 0 uses 8 R worker processes by default; --n-jobs N or n_jobs = N sets the worker count explicitly.

Redrawing from saved results

Plot from an existing results file without re-running regressions:

# PNG
uv run regression-monkey-plot \
    --results outputs/<timestamp>/foo_results.csv \
    --meta    outputs/<timestamp>/foo_plot_meta.json \
    --output  outputs/<timestamp>/foo_replot.png

# HTML
uv run regression-monkey-html \
    --results outputs/<timestamp>/foo_results.csv \
    --meta    outputs/<timestamp>/foo_plot_meta.json \
    --output  outputs/<timestamp>/foo_interactive.html

Pass --keep-temp to the main entry point to retain *_results.csv and *_plot_meta.json after a run.

Performance

Total spec count = product of per-slot factors:

Slot type	Factor
Plain `controls_test` variable	×2
`controls_test` alternative group of size g	×(g+1)
Plain `controls_must` variable	×1
`controls_must` alternative group of size g	×g

12 plain controls_test variables → 4 096 regressions per spec.

R mode reuses cached demeaned samples for robust and one-way clustered specs, and falls back to exact per-spec fixest::feols for multi-way clustering. Stata mode delegates estimation to reghdfe; plotting remains handled by the shared PNG/HTML exporters.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
assets		assets
config		config
src/regression_monkey		src/regression_monkey
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
TO-DO.md		TO-DO.md
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

regression_monkey

Output modes

Requirements

Quick start

Configuration file

Input data

Control variable structure

Predefined fixed-effect specs

Output structure

Stata engine

R engine

Configure the R environment

Redrawing from saved results

Performance

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

regression_monkey

Output modes

Requirements

Quick start

Configuration file

Input data

Control variable structure

Predefined fixed-effect specs

Output structure

Stata engine

R engine

Configure the R environment

Redrawing from saved results

Performance

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages