Looper tests whether LoRA consolidation of agent experience produces more capable coding agents. Python 3.11 + MLX + Ollama + SWE-Bench-CL. After 8 experiments: LoRA didn't work, but framework engineering tripled resolve rate (8%→27%).
- LEARNINGS.md — Results from all 8 experiments
- docs/problem.md — The skills vs knowledge thesis
- docs/architecture.md — Framework components and data flow
- docs/experiments.md — All pre-registered experiments with results
- docs/future-work.md — Next directions
- docs/development_process.md — Workflow, agentic engineering
- docs/research_landscape.md — Literature survey
- readinglist.md — 100+ annotated papers
- Python 3.11 — Pydantic v2 for data models
- MLX — LoRA training on Apple Silicon (M4 32GB)
- Ollama — base model inference (~18 sec/request)
- SWE-Bench-CL — 273 tasks, 8 Python repos
- Agent protocol — XML tool tags (
<bash>,<read>,<write>,<edit>,<done>) - Testing — pytest, ruff for formatting/linting
- Minimal code. Don't add abstractions for hypothetical futures.
- Simple over clever. Readable beats elegant.
- Test-driven. Write tests alongside implementation.
- Incremental. Small changes, one thing at a time.
ruff formatandruff checkbefore every commit. Pre-commit hook enforces this.- Don't refactor existing code unless asked.
- Don't add features beyond what was requested.
- Pydantic v2 for all data models.
- No dead code, unused imports, or commented-out code.
/spec <topic>— Load and review relevant docs before working./research <topic>— Launch research agents. Returns findings. Human decides./implement <task>— Implement one scoped task with tests. Uses start→review→resume loop./review— Review changes against standards./test— Run tests and report results./codify <learning>— Document a discovery in the relevant doc.
When you fix a bug, hit a gotcha, or make a design decision: update the relevant doc, not just the code. See LEARNINGS.md for examples of well-documented findings.
Every time you delegate to a subagent:
- Start: Give the subagent a scoped task with clear boundaries.
- Review: When it completes, review its output for scope, quality, minimal code.
- Resume: Send it back to refactor, simplify, reduce code, simplify tests, rerun tests, and do an end-to-end verification (input→output check).
- Accept: Only accept when the code is minimal and all checks pass.
- One inference server at a time (Ollama + MLX can't share 32GB)
- LoRA fusion breaks quantized models — use dynamic adapter application
- 14B max for adapted inference on 32GB M4
- Subprocess training essential (MLX holds GPU memory)
caffeinate -dimsfor overnight runs- Ollama on SSD:
OLLAMA_MODELS=/Volumes/1TB_SSD/looper/ollama_models ollama serve
- Python 3.11 venv at
.venv - SSD storage:
/Volumes/1TB_SSD/looper/(models, datasets, results, workspaces) - v1 code preserved on
v1branch for reference
- v1 complete (8 experiments, 222+ tests). Code on
v1branch. - Main branch: documentation only. Starting v2 fresh.
- Keep this file under 100 lines.