Multi-agent code review CLI with long-chain reasoning. Five specialist agents (security, performance, style, architecture, test) review code in parallel; a coordinator agent synthesizes their reasoning chains into one prioritized action list. Works with any OpenAI-compatible endpoint — MiMo, OpenAI, DeepSeek, Moonshot, vLLM, Ollama.
Single-LLM code review has a recurring failure mode: the model spreads its attention thin and produces shallow, generic feedback. Asking one model to "find security AND performance AND style AND architecture AND test issues" tends to produce six bullet points of mush.
agent-code-reviewer flips the model. Each concern gets its own specialist agent with its own role prompt and its own reasoning chain. The findings come back as structured JSON, and a coordinator agent consumes the full reasoning traces from all five specialists to:
- Deduplicate findings that overlap across concerns
- Suppress likely false positives (with justification)
- Resolve disagreements between specialists explicitly
- Rank the surviving issues by
(severity × confidence) ÷ fix-effort
The result is a short, actionable list — not a wall of "could-be-an-issue" noise.
┌──────────────────────────────────────────────────────────────────┐
│ your source file │
└─────────────────────────────────┬────────────────────────────────┘
│
┌────────┬────────┬───────┼────────┬────────┐
▼ ▼ ▼ ▼ ▼ ▼
┌────────┐┌────────┐┌────────┐┌──────────┐┌─────────────┐
│Security││ Perf ││ Style ││Architectr││TestCoverage │ ← parallel, each with
│ Agent ││ Agent ││ Agent ││ Agent ││ Agent │ a 2-stage prompt:
└───┬────┘└───┬────┘└───┬────┘└────┬─────┘└──────┬──────┘ 1) reasoning chain
│ │ │ │ │ 2) JSON findings
└────────┴────────┴───────────┴─────────────┘
│
▼
┌─────────────────────┐
│ Coordinator │ ← reads all 5 reasoning chains,
│ (long-chain reason)│ synthesizes, ranks, suppresses
└─────────┬───────────┘
│
┌───────────┼───────────┐
▼ ▼ ▼
terminal markdown json
(rich) (PR-ready) (CI-ready)
Two-stage prompting per agent is the key lever. Stage 1 asks the agent to reason out loud (chain-of-thought) without committing to a JSON schema. Stage 2 hands the agent its own reasoning back and asks it to extract structured findings. The reasoning trace also survives into the coordinator's context — so the coordinator can see why a specialist flagged something, not just what it flagged.
git clone https://github.com/m74567437-maker/agent-code-reviewer.git
cd agent-code-reviewer
pip install -e .Or with pip install -e ".[dev]" to get test dependencies.
Copy .env.example to .env and fill in:
LLM_API_KEY=sk-...
LLM_BASE_URL=https://api.openai.com/v1 # or any compatible endpoint
LLM_MODEL=gpt-4o-miniWorks out-of-the-box with any provider that ships /v1/chat/completions:
| Provider | LLM_BASE_URL |
Sample LLM_MODEL |
|---|---|---|
| OpenAI | https://api.openai.com/v1 |
gpt-4o-mini |
| Xiaomi MiMo | (per the MiMo open platform docs) | mimo-7b |
| DeepSeek | https://api.deepseek.com/v1 |
deepseek-chat |
| Moonshot | https://api.moonshot.cn/v1 |
moonshot-v1-8k |
| Ollama | http://localhost:11434/v1 |
qwen2.5-coder |
| vLLM | http://localhost:8000/v1 |
(whatever you served) |
# Review a file, print to terminal
agent-review review examples/sample_buggy.py
# Markdown report, ready to paste into a PR
agent-review review src/foo.py -f md > review.md
# JSON for CI integration
agent-review review src/foo.py -f json -o review.json
# Only specific agents
agent-review review src/foo.py --enable security --enable performance
# Stdin
cat src/foo.py | agent-review review --stdin -l python
# Fail the build on high or critical findings
agent-review review src/foo.py --fail-on high$ agent-review agents─────────────────────────── examples/sample_buggy.py (python) ───────────────────────────
Overall severity: HIGH • Total findings: 7 • Tokens: 4318 • Duration: 6234ms
╭─ Executive Summary ─────────────────────────────────────────────────────╮
│ Two critical issues block deployment: a SQL injection on line 14 and a │
│ hardcoded API key on line 7. Performance is fine for the current scale │
│ but the N+1 pattern on line 22 will bite at >1k records. │
╰─────────────────────────────────────────────────────────────────────────╯
Prioritized Actions
┏━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ # ┃ Severity ┃ Title ┃ Sources ┃
┡━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 1 │ CRITICAL │ SQL injection via raw f-str │ security │
│ 2 │ CRITICAL │ Hardcoded API key │ security │
│ 3 │ HIGH │ N+1 DB query in loop │ performance, architecture│
│ 4 │ MEDIUM │ Missing edge cases in tests │ test_coverage │
└───┴───────────┴──────────────────────────────┴─────────────────────────┘
See docs/ARCHITECTURE.md for the design walkthrough — agent contracts, the two-stage prompting protocol, why the coordinator gets the full reasoning trace, and how to add your own agent.
Subclass BaseAgent, implement system_prompt() and reasoning_prompt(), register it:
# my_agent.py
from agent_reviewer.agents.base import BaseAgent, ReviewContext
class DocsAgent(BaseAgent):
name = "docs"
category = "documentation"
def system_prompt(self) -> str:
return "You are a docs reviewer. Flag missing or misleading docstrings."
def reasoning_prompt(self, ctx: ReviewContext) -> str:
return f"Review docs in this {ctx.language} file...\n```\n{ctx.source}\n```"Then:
from agent_reviewer.agents import AGENT_REGISTRY
AGENT_REGISTRY["docs"] = DocsAgent- Diff-aware mode (
--diff HEAD~1) so reviews focus on what actually changed - GitHub Action so the report shows up as a PR comment
- Repo-level review across multiple files with cross-file reasoning
- Auto-fix mode — feed the suggested fixes back into a Patch Agent
- Caching layer keyed on
(file content hash, agent, model)to skip re-reviews
MIT. See LICENSE.