ReplicateAI

Autonomous replication of empirical economics papers using Deep Agents and a Modal sandbox.

See ROADMAP.md for planned work and DESIGN.md for architecture.

Setup

cd replicate_ai
uv sync
uv sync --group gui      # optional: browser GUI (--gui)
cp .env.example .env     # add ANTHROPIC_API_KEY (see below)
uv run modal token new   # one-time Modal auth (or set MODAL_TOKEN_* in .env)

Environment variables

Variable	When required	Purpose
`LLM_PROVIDER`	No (default `anthropic`)	`anthropic`, `cloudflare-kimi`, `cloudflare-glm`, `gemini`, or `groq`
`ANTHROPIC_API_KEY`	`LLM_PROVIDER=anthropic`	Claude API for the agent and auditor
`CF_ACCOUNT_ID`, `CF_AI_API_TOKEN`	Cloudflare providers	Workers AI (Kimi K2.6 / GLM-4.7-Flash)
`GOOGLE_API_KEY` / `GEMINI_API_KEY`	`LLM_PROVIDER=gemini`	Google Gemini API key (Developer API mode)
`GOOGLE_GENAI_USE_VERTEXAI`, `GOOGLE_CLOUD_PROJECT`	Vertex AI mode (`GOOGLE_GENAI_USE_VERTEXAI=true`)	Use Gemini via Vertex AI
`GEMINI_THINKING_LEVEL`	`LLM_PROVIDER=gemini`	Thinking level: `minimal` \| `low` \| `medium` \| `high` (default `medium`)
`GROQ_API_KEY`	`LLM_PROVIDER=groq`	Groq API key
`MODAL_TOKEN_ID` / `MODAL_TOKEN_SECRET`	No*	Modal sandbox; *not needed if you ran `modal token new`
`REPLICATE_AI_SANDBOX_TIMEOUT_SECONDS`	No	Modal sandbox timeout (seconds; default 1800 / 30 min)
`LANGSMITH_*`	No	Optional tracing in LangSmith
`TQDM_DISABLE`	No	Disable tqdm progress bars (stability on some macOS setups)
`HF_HUB_DISABLE_PROGRESS_BARS`	No	Disable Hugging Face progress bars (often fixes GUI startup crashes)

See .env.example for a full template.

LLM providers (testing vs demo)

# Cheap harness / wiring tests (Cloudflare GLM-4.7-Flash)
LLM_PROVIDER=cloudflare-glm uv run replicate-ai ../examples/card_krueger

# Fuller dry runs (Cloudflare Kimi K2.6)
LLM_PROVIDER=cloudflare-kimi uv run replicate-ai ../examples/card_krueger

# Canonical demo (Anthropic Sonnet)
LLM_PROVIDER=anthropic uv run replicate-ai ../examples/card_krueger

# Google Gemini (Google AI / Gemini)
LLM_PROVIDER=gemini uv run replicate-ai ../examples/card_krueger

# Groq (low-latency inference)
LLM_PROVIDER=groq uv run replicate-ai ../examples/card_krueger

# Or pass --provider on the CLI (overrides LLM_PROVIDER for that run)
uv run replicate-ai ../examples/card_krueger --provider glm

PDF parsing runs on the host with Docling by default (layout-aware tables). The first run downloads Docling layout weights from Hugging Face. For legacy Camelot extraction: uv run replicate-ai … --pdf-backend legacy and install Ghostscript (brew install ghostscript on macOS).

Scanned PDFs: Older journal issues (e.g. AER pre-2000) are often bitmap scans rather than text-layer PDFs. Docling can detect table regions but cell content may be garbled or missing in paper_tables.json. In that case the agent falls back to searching paper_text.md and the target_spec_reference.json benchmark — replication can still succeed (demonstrated on Imbens et al. 2001). Enabling OCR (REPLICATE_AI_PDF_OCR=true) may help on true image scans but is slower and not guaranteed to clean up encoding artifacts in already-digitised PDFs.

Example packs

See ../examples/README.md for all curated paper + data bundles.

Pack	Paper	Difficulty
`card_krueger`	Card & Krueger (1994)	Demo
`dehejia_wahba`	Dehejia & Wahba (1999) / LaLonde NSW	Easy
`imbens_lottery`	Imbens, Rubin & Sacerdote (2001)	Easy–medium
`angrist_lavy`	Angrist & Lavy (1999) class size	Medium
`autor_dorn_hanson`	Autor, Dorn & Hanson (2013) China shock	Medium–hard
`acemoglu_johnson_robinson`	Acemoglu, Johnson & Robinson (2001)	Hard

For each pack (except card_krueger, which ships data.csv already):

uv run --directory replicate_ai python ../examples/<pack>/data_population_script.py
# add paper.pdf (links in each pack README)
uv run replicate-ai ../examples/<pack>

target_spec_reference.json in each pack lists published headline coefficients; it is seeded to /workspace/target_spec_reference.json as a read-only hint. The agent still writes target_specification.json at runtime.

Run

The Card & Krueger example pack (../examples/card_krueger/) includes card_krueger.pdf, data.csv (with a planted demo bug), and njmin/ survey files. The CLI extracts the PDF on your machine (CPU-heavy Docling on first run), then uploads paper.pdf, data.csv, paper_text.md, and paper_tables.json into Modal /workspace. The sandbox only runs econometrics code. Regenerate the CSV with:

uv run python ../examples/card_krueger/data_population_script.py

cd replicate_ai
uv run replicate-ai ../examples/card_krueger

On a TTY, that launches the dashboard TUI (see DESIGN_TUI.md): live run log, headline coefficient card, and final audit. Use --no-tui for the plain Rich/stdout CLI (CI-friendly).

When you pass an example_dir, the audit is saved on the host after a successful run:

Default: <example_dir>/replication_audit.md
Override: --audit-out /path/to/audit.md
TUI: auto-save on completion; press s to save again
Opt out: --no-save-audit

# UI shell with fake demo data (no Modal / LLM)
uv run replicate-ai --tui-demo

# Force TUI or disable it
uv run replicate-ai --tui ../examples/card_krueger
uv run replicate-ai --no-tui ../examples/card_krueger

# Browser GUI (requires: uv sync --group gui)
uv run replicate-ai --gui
uv run replicate-ai --gui -p anthropic ../examples/card_krueger
uv run replicate-ai --gui-demo

See DESIGN_GUI.md for launcher, uploads, and API details.

GUI run logs

The GUI persists a per-run host log file you can view in-browser via View full log:

<example_dir>/.replicate_ai/runs/<run_id>.log

replicate-ai is a console script declared in [project.scripts]; equivalent to uv run python -m replicate_ai.main.

Module layout

src/replicate_ai/
├── main.py            # CLI entry: argparse, .env loading, TUI vs CLI routing
├── agent.py           # thin wrapper around runner.run_replication
├── runner/            # orchestration + TUI event emission
├── tui/               # Textual dashboard (docs/DESIGN_TUI.md)
├── gui/               # Browser GUI (docs/DESIGN_GUI.md)
├── constants.py       # APP_NAME, default user message, sandbox timeout
├── models.py          # LLM provider selection
├── prompts.py         # loads system prompts from disk
├── sandbox_image.py   # Modal image built from [dependency-groups.sandbox]
├── system_prompts/    # ECONOMETRICIAN_PROMPT.md, AUDITOR.md
├── subagents/         # auditor sub-agent config
├── preflight.py       # host PDF extract + upload to Modal
└── tools/
    ├── pdf_core.py    # PDF dispatch (host)
    └── pdf_docling.py # Docling backend (host)

Sandbox Python deps come from [dependency-groups.sandbox] in pyproject.toml, installed into the Modal image via uv_pip_install (see sandbox_image.py).

Tests

uv run pytest -q

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReplicateAI

Setup

Environment variables

LLM providers (testing vs demo)

Example packs

Run

GUI run logs

Module layout

Tests

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

ReplicateAI

Setup

Environment variables

LLM providers (testing vs demo)

Example packs

Run

GUI run logs

Module layout

Tests