bunny

"why are you wearing that stupid man suit?"

a dark factory for the solo developer. two AI agents, seven steps, one build at a time — and a knowledge graph that makes every build smarter than the last.

most AI coding tools scale by throwing more agents at the problem. bunny scales by learning. every build, every spike, every brainstorm feeds a persistent knowledge graph. the tenth build knows what the first nine learned. parallel agents are fast and dumb. serial + recursive is slow and wise.

the factory runs a single, observable pipeline — you can watch each step, intervene anywhere, re-run one phase. no orchestrator black box. no 50-agent swarm you can't debug. just two agents with adversarial incentives:

gemini writes tests to break things — reads the spec, finds gaps, generates a hostile test suite.
claude writes code to survive — never sees the test-generation prompt, can't weaken the tests.

code ships when claude beats gemini's tests. then both agents feed the knowledge graph.

specify → challenge → plan → tasks → narrow[1→2→3] → verify → ruminate
claude    gemini      claude  claude   gemini+claude    gemini   claude

proof

each of these was built from a single sentence. zero human intervention:

prompt	tests	result
"a library that parses semver ranges (^, ~, >=, \|\|, hyphen, x-ranges)"	54	clean
"RFC 6902 JSON Patch — add, remove, replace, move, copy, test with atomic rollback"	37	clean
"a cron expression parser that computes the next N scheduled times"	12	clean

see demos/ for full pipeline output — specs, tests, source, knowledge graph state, and logs.

quick start

bunny requires both API keys — the adversarial model needs two agents:

export ANTHROPIC_API_KEY="sk-ant-..."   # claude: spec, plan, implement, ruminate
export GEMINI_API_KEY="..."             # gemini: challenge, test-gen, verify

install into any git repo:

cd my-project
curl -fsSL https://raw.githubusercontent.com/ahoward/bunny/main/install.sh | bash

this drops a single binary at ./bin/bny. run it directly, or copy/symlink it somewhere on your PATH:

# scaffold project state (guest mode — won't clobber your files)
./bin/bny init

# give it a task
./bin/bny build "add an authentication middleware"

or from source:

git clone https://github.com/ahoward/bunny.git && cd bunny
./dev/setup
./bin/bny --help

what `bny init` does

bny init detects your project type (bun, node, rust, go, python, ruby) and scaffolds:

bny/ directory — project state (roadmap, decisions, knowledge graph)
dev/ scripts — test, setup, health, pre/post-flight (customizable)
AGENTS.md — protocol file for AI agents working in your repo

it's a guest, not a landlord — uses marker-delimited blocks so it never clobbers your existing files. idempotent on re-run. bny uninit --force removes all traces.

what `bny build` does

bny build creates a feature branch and runs the full pipeline. all changes happen on that branch — your main branch is untouched until you merge.

the pipeline streams output to your terminal as it runs. when it finishes, you're on the feature branch with passing tests — review the code, run ./dev/test, and merge when satisfied.

if the pipeline can't pass all tests after 9 attempts (3 rounds × 3 retries), it stops and reports what failed. the branch is left in place so you can inspect or continue manually.

use --interactive to pause at checkpoints (after spec, before each narrowing round) for human review.

typical cost: $0.50–$2.00 per build depending on spec complexity and retry count. spikes are cheaper (~$0.25). knowledge graph operations (digest, storm, loop) are ~$0.05–$0.20 each.

the core loop

# 1. feed the knowledge graph
bny digest README.md docs/ https://example.com/api-docs

# 2. let it think
bny brane storm "what about real-time collaboration?"
bny brane loop "auth strategies"                     # autonomous: search → fetch → digest → repeat

# 3. bridge knowledge to execution
bny proposal "auth system"
bny proposal accept auth-system                      # → appends to bny/roadmap.md

# 4. build (2 agents, 7 steps, tested code)
bny build "add user auth"

# 5. or prototype without guardrails
bny spike "prototype oauth flow"

# 6. check what the graph learned (it compounds)
bny brane ask "what are the security risks?"

run bny --help for the full command reference.

input modes

most commands accept text input. the pattern is uniform across the CLI:

# inline text (positional args)
bny build "add user auth"
bny brane storm "real-time collab?"

# read from file
bny build --input MVP.md
bny specify --input requirements.txt

# read from stdin
cat spec.md | bny build -
echo "auth strategies" | bny brane storm -

mode	syntax	when
inline	`"text here"`	quick one-liners
file	`--input <path>`	feeding docs, specs, PRDs
stdin	`-`	piping from other commands

bny digest is the exception — its positional args are source paths (bny digest README.md docs/), not inline text. it still supports --input and - for explicit file/stdin input.

three pillars

knowledge graph (brane)

a persistent, self-organizing knowledge graph that accumulates understanding across every build, spike, and brainstorm. brane loop is autonomous research — it reflects on gaps, searches the web, fetches sources, absorbs what it finds, and repeats until it converges. the tenth build knows what the first nine learned.

stored locally in bny/brane/ as markdown files. commit it to share context with your team, or gitignore it for personal use.

bny digest <file|dir|url>                            # ingest
bny brane ask "what are the security risks?"         # query (read-only)
bny brane storm "real-time collab?"                  # divergent brainstorming
bny brane loop "distributed systems"                 # autonomous research
bny brane enhance "security model"                   # convergent refinement
bny brane lens add security "attack vectors, auth"   # filtering perspectives
bny brane rebuild                                    # reprocess everything through current lenses
bny brane tldr                                       # instant outline

code graph (map)

structural codebase awareness via tree-sitter. functions, classes, imports, exports — parsed, not guessed.

bny map                                              # generate structural codebase map

dark factory (build)

the 7-step pipeline (with 3×3 narrowing), or one step at a time:

bny build "add user auth"                            # full pipeline
bny build specify "add user auth"                    # just one step
bny build challenge                                  # just one step
bny next                                             # pick next roadmap item, run pipeline
bny --effort full build                              # more retries per round

adversarial TDD

we don't let the same agent write tests and code. here's why.

traditional TDD assumes the test-writer wants to find bugs. true for disciplined humans; catastrophically false for AI. when one agent writes both tests and implementation, it writes tests to confirm its own approach — optimizing for green checks, not caught failures.

the fix: two agents with opposed incentives.

step	agent	what
specify	claude	write spec with acceptance scenarios
challenge	gemini	harden spec — find gaps, edge cases, ambiguities
plan	claude	implementation plan
tasks	claude	implementation tasks (no test tasks)
narrow	gemini writes tests, claude implements	3×3 narrowing (see below)
verify	gemini	post-implementation — are the tests real? anything missed?
ruminate	claude	reflect on build, feed knowledge graph

bold = gemini involvement. gemini writes every test. claude writes every line of implementation. claude never sees the test-generation prompt. this is not a rule — it's architecture. the system makes the wrong thing hard.

3×3 narrowing

instead of generating all tests at once and hoping claude passes them, we narrow in 3 rounds — each more adversarial than the last:

round	gemini writes	gemini sees	claude's job
1. contracts	spec-as-code tests	spec + challenge	build the foundation
2. properties	behavioral invariants	spec + claude's source code	don't break contracts
3. boundaries+golden	edge cases + regression snapshots	spec + source + all tests	don't break anything

each round: gemini generates tests (bny build test-gen --round N), then claude implements (bny build implement --round N) with up to 3 retries. max 9 test runs total. typical: ~4.

the key insight: rounds 2-3 include claude's actual source code. gemini doesn't just test the spec — it targets where claude's implementation is weakest. this is adversarial review with teeth.

test layers

contract tests (round 1) — one test per acceptance scenario. the spec as code.
property tests (round 2) — invariants for all inputs (fast-check, hypothesis, proptest).
golden file tests (round 3) — known-good output as fixtures. diff on regression.
boundary tests (round 3) — edge cases from the challenge step. empty, max, malformed, unicode.

language agnostic

bunny auto-detects your project type from config files (package.json, Cargo.toml, go.mod, etc.) and configures the right test framework and property testing library. zero config.

project	test framework	property lib
bun	`bun:test`	`fast-check`
node	`jest`	`fast-check`
rust	`cargo test`	`proptest`
go	`testing`	`rapid`
python	`pytest`	`hypothesis`
ruby	`rspec`	`rantly`

environment

Variable	Default	Purpose
`ANTHROPIC_API_KEY`	required	Claude (specify, plan, tasks, implement, ruminate)
`GEMINI_API_KEY`	required	Gemini (challenge, test-gen, verify)
`BNY_MODEL`	—	override LLM model for all subcommands
`BUNNY_LOG`	off	structured JSON logging to stderr

stack

runtime: bun — fast typescript runtime
language: TypeScript (strict mode)
code awareness: tree-sitter via WASM
no frameworks. no ORMs. no build tools. zero framework dependencies.

dive deeper

AGENTS.md — the strict protocol agents must follow
demos/ — complete projects generated from a single sentence

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
.claude/commands		.claude/commands
.githooks		.githooks
bin		bin
bny		bny
demos		demos
dev		dev
src		src
tests		tests
.gitignore		.gitignore
.tool-versions		.tool-versions
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
README.md		README.md
ROADMAP.md		ROADMAP.md
bun.lock		bun.lock
install.sh		install.sh
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bunny

proof

quick start

what `bny init` does

what `bny build` does

the core loop

input modes

three pillars

knowledge graph (brane)

code graph (map)

dark factory (build)

adversarial TDD

3×3 narrowing

test layers

language agnostic

environment

stack

dive deeper

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bunny

proof

quick start

what bny init does

what bny build does

the core loop

input modes

three pillars

knowledge graph (brane)

code graph (map)

dark factory (build)

adversarial TDD

3×3 narrowing

test layers

language agnostic

environment

stack

dive deeper

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

what `bny init` does

what `bny build` does

Packages