Skip to content

utsabpanta/evergreen-migration-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌲 evergreen

The migration agent that is never allowed to break your build.

Every. Single. Commit. Passes. Your. Tests.

Point it at a green repo, give it a goal — "migrate Flask → FastAPI", "Angular → React", "unittest → pytest" — and it walks your codebase there in dozens of tiny commits, running your real test suite after each one, refusing to advance while anything is red.

Python 3.11+ License: MIT tests runs on open weights


Most "AI migration" tools dump one giant diff in your lap and wish you luck. evergreen sells the opposite — a guarantee:

At every commit the agent creates, the full test suite passes.

No exceptions. Enforced in deterministic code the model cannot reach.

The result is a git history that reads like a careful senior engineer wrote it: atomic, reviewable, and git bisect-able from the first commit to the last.

Watch it think

The magic isn't the code generation — it's the closed loop. Here's a real run (a module rename driven by a local model via Ollama). Watch step 5: the model writes a broken edit, the suite goes red, the agent diagnoses it, repairs it, and only then commits:

baseline green: 14 tests in 0.23s

step 5/8 step-004 · Update calclib.stats to import from arithmetic (expand_contract, move)
  apply  llm edit (attempt 1)
  apply error: replace_in_file: 'import calclib.ops' not found in calclib/stats.py
  diagnose  repair: Update calclib.stats to import from arithmetic
  apply  builtin:apply_edits (attempt 2)
  green (full suite, 0.22s)
  commit a8752e7d (step-004)        ← committed ONLY after green

step 7/8 step-006 · Remove calclib.ops.py (expand_contract, remove)
  apply  llm edit (attempt 1)
  red (full suite, 0.25s) → tests/test_ops.py::TestOps::test_div
  diagnose  repair: Remove calclib.ops.py
  apply  builtin:apply_edits (attempt 2)
  red (full suite, 0.23s) → 3 failing
  rollback step-006: could not reach green within the repair budget   ← NO commit. clean.

done 6 green commits / 8 steps
migration incomplete: rolled back [step-006]. Your branch is untouched.

That's the whole pitch in one screen: the model makes mistakes, the invariant catches every one of them. A weaker model just means more repairs and the occasional honest "I couldn't finish this step" — never a broken commit, never a touched branch.

Quickstart

Install straight from GitHub (no PyPI release needed):

pip install "git+https://github.com/utsabpanta/evergreen-migration-agent"   # Python 3.11+

Or clone for development:

git clone https://github.com/utsabpanta/evergreen-migration-agent
cd evergreen-migration-agent
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

Then point it at any OpenAI-compatible model endpoint and run it inside a target repo:

export LLM_BASE_URL=http://localhost:11434/v1   # e.g. Ollama
export LLM_MODEL=qwen3-coder:480b-cloud

cd ~/path/to/your/green/repo           # must be: a git repo, clean tree, passing tests
evergreen plan "migrate the test suite from unittest to pytest"   # preview the DAG — changes nothing
evergreen run  "migrate the test suite from unittest to pytest"   # do it, with a live trace

The four commands:

Command What it does
evergreen plan "<goal>" Show the step DAG + chosen strategies. Changes nothing.
evergreen run "<goal>" Run it: plan → apply → test → commit, never advancing on red.
evergreen status Done / next / blocked steps for the saved run.
evergreen resume Continue an interrupted run from the last green commit.

Flags worth knowing: --no-promote (leave the result on the evergreen/* branch instead of fast-forwarding yours), --interactive (approve each step), --plan-file plan.json (run a reviewed plan deterministically), --characterize (generate safety-net tests if coverage is missing), --redact (log only hashes of what's sent to a hosted model), --max-repairs / --max-replans (tune the autonomy budget).

Try it in 30 seconds (no model required)

The unittest→pytest migration ships as a deterministic recipe — it needs no LLM at all:

cp -r tests/fixtures/toy_unittest_pkg /tmp/demo && cd /tmp/demo
git init -b main && git add -A && git commit -m "initial"
evergreen run "migrate the test suite from unittest to pytest"
git log --oneline      # 5 commits — check out any one; pytest is green at every step

What it can migrate

evergreen is language-agnostic; "green" is defined by adapters that read your real test runner:

Ecosystem Runner Structured results via
Python pytest (also collects unittest suites) junit XML
Node.js node --test (built-in) TAP reporter
JS / TS jest --json reporter
JS / TS vitest --reporter=json
anything any command via [tool.evergreen] test_command exit code

For well-known migrations, playbooks inject battle-tested strategy into the planner: Angular→React, Next.js→TanStack, JS→TS, CommonJS→ESM, Flask→FastAPI, unittest→pytest — each with the right coexistence pattern and the oracle warning (e.g. "Angular TestBed unit tests die with the framework — you need behavior-level tests").

How it works

goal + repo ─► Planner (LLM) ─► DAG of atomic steps ─► Orchestrator (DETERMINISTIC)
                  ▲                                       │  for each step:
                  │ re-plan                               │    apply  → verify → commit
                  │                                       │    red?   → diagnose → repair → retry
            Diagnostician (LLM) ◄── red tests ────────────┤    stuck? → roll back, never commit
                                                          ▼
                                            one green commit per step ✅
  • The model proposes; your test suite disposes. The LLM plans, edits, and diagnoses. The commit decision lives in CommitManager, which demands the actual green full-suite result as proof and raises PrimeDirectiveViolation otherwise. The model has no path to a commit.
  • Crossing the valley. You can't atomically swap a framework and stay green, so the planner picks a coexistence strategy per concern — expand→migrate→contract, strangler fig, shim, branch-by-abstraction — and a validator rejects any plan that deletes before it migrates.
  • Sandboxed & reversible. All work happens in a git worktree on an evergreen/* branch. Snapshot before each step, git reset on failure, your branch untouched until you say so.
  • Trustworthy green. Zero tests → it refuses to migrate blindly and offers characterization tests. Flaky tests → quarantined so nondeterminism never defines "green." Slow suite → affected tests run first for fast feedback, but the full suite always gates the commit.
  • Resumable & auditable. Every commit is a safe resume point; each carries an Evergreen-Step: trailer; the whole run replays from a JSONL trace + plan.

The model: bring your own

One env-var interface, never a hard-coded vendor — vLLM, SGLang, Ollama, Z.AI, or any endpoint speaking POST /v1/chat/completions:

export LLM_BASE_URL=...     # required to enable LLM planning/editing/diagnosis
export LLM_API_KEY=...      # if your endpoint needs one
export LLM_MODEL=...        # e.g. glm-5.1, qwen3-coder, deepseek, …

Local-first: the repo, sandbox, test execution, and static analysis never leave your machine. Self-host the model (or run a recipe-only migration) and nothing leaves at all. When a hosted endpoint is used, every payload is logged to .git/evergreen/llm.jsonl--redact keeps only hashes. Smarter model = better plans and fewer rollbacks; it can never mean a broken commit.

Honest limitations

Trust is the product, so here's what it won't do:

  • Your test oracle must survive the migration. Tests coupled to the framework you're removing (Angular TestBed, mocked next/router) can't pin behavior across the swap. Use behavior-level tests (HTTP / DOM / E2E); evergreen detects the gap and warns.
  • Infrastructure migrations are mostly out of scope. "API Gateway → ALB + EC2" has no fast local test oracle for "the ALB routes correctly" — that's verified by deploying to the cloud. evergreen can rehost the application code (handler → server) under the invariant, but the IaC + traffic cutover is human-owned deploy work it can scaffold, not guarantee.
  • Dependency-changing migrations that require npm install / pip install of new packages need the Docker-isolated sandbox (a documented extension point, not yet built); the worktree baseline assumes deps are already present.

How it's proven

27 acceptance tests gate the spec's four build phases — including the headline proof: a real Flask→FastAPI migration where git bisect run pytest finds no red commit in the produced range (it pins only a deliberately injected bad commit), and the same loop runs end-to-end on a real JavaScript repo under node --test.

pip install -e ".[dev]" && pytest      # 27 passing

Project layout

evergreen/
  cli.py              # typer entrypoints: plan / run / resume / status
  orchestrator.py     # the deterministic loop; enforces the Prime Directive
  planner.py          # LLM-backed DAG planner, grounded by static analysis
  executor.py         # codemod-first, LLM-fallback step application
  verifier.py         # runner adapters → structured pass/fail (sole authority on "green")
  diagnostician.py    # LLM-backed repair on red
  commit.py           # CommitManager: refuses to commit anything not full-suite green
  sandbox.py          # git worktree isolation, snapshot/restore, promote
  suite_assessment.py # baseline check, flaky quarantine, test-impact analysis
  playbooks.py        # per-migration strategy guidance injected into the planner
  strategies/         # expand_contract · strangler_fig · shim · branch_by_abstraction
  recipes.py          # deterministic, LLM-free migrations (e.g. unittest→pytest)
  llm.py              # OpenAI-compatible client (any vendor)
  models.py           # MigrationPlan / Step / StepResult (pydantic)
tests/                # acceptance gates incl. the git-bisect proof; tiny real fixture repos

License

MIT.


The model proposes. Your test suite decides.

Built with Claude Code · runs on open weights

About

🌲 Autonomous code-migration agent where every single commit passes your full test suite — atomic, reviewable, git-bisectable. The model proposes; your tests decide.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages