Skip to content

m74567437-maker/agent-code-reviewer

Repository files navigation

agent-code-reviewer

Multi-agent code review CLI with long-chain reasoning. Five specialist agents (security, performance, style, architecture, test) review code in parallel; a coordinator agent synthesizes their reasoning chains into one prioritized action list. Works with any OpenAI-compatible endpoint — MiMo, OpenAI, DeepSeek, Moonshot, vLLM, Ollama.

Python License


Why does this exist?

Single-LLM code review has a recurring failure mode: the model spreads its attention thin and produces shallow, generic feedback. Asking one model to "find security AND performance AND style AND architecture AND test issues" tends to produce six bullet points of mush.

agent-code-reviewer flips the model. Each concern gets its own specialist agent with its own role prompt and its own reasoning chain. The findings come back as structured JSON, and a coordinator agent consumes the full reasoning traces from all five specialists to:

  1. Deduplicate findings that overlap across concerns
  2. Suppress likely false positives (with justification)
  3. Resolve disagreements between specialists explicitly
  4. Rank the surviving issues by (severity × confidence) ÷ fix-effort

The result is a short, actionable list — not a wall of "could-be-an-issue" noise.

How it works

┌──────────────────────────────────────────────────────────────────┐
│                         your source file                          │
└─────────────────────────────────┬────────────────────────────────┘
                                  │
        ┌────────┬────────┬───────┼────────┬────────┐
        ▼        ▼        ▼       ▼        ▼        ▼
   ┌────────┐┌────────┐┌────────┐┌──────────┐┌─────────────┐
   │Security││  Perf  ││ Style  ││Architectr││TestCoverage │   ← parallel, each with
   │ Agent  ││ Agent  ││ Agent  ││  Agent   ││   Agent     │     a 2-stage prompt:
   └───┬────┘└───┬────┘└───┬────┘└────┬─────┘└──────┬──────┘     1) reasoning chain
       │        │        │           │             │             2) JSON findings
       └────────┴────────┴───────────┴─────────────┘
                            │
                            ▼
                  ┌─────────────────────┐
                  │    Coordinator      │   ← reads all 5 reasoning chains,
                  │  (long-chain reason)│     synthesizes, ranks, suppresses
                  └─────────┬───────────┘
                            │
                ┌───────────┼───────────┐
                ▼           ▼           ▼
            terminal      markdown     json
            (rich)        (PR-ready)   (CI-ready)

Two-stage prompting per agent is the key lever. Stage 1 asks the agent to reason out loud (chain-of-thought) without committing to a JSON schema. Stage 2 hands the agent its own reasoning back and asks it to extract structured findings. The reasoning trace also survives into the coordinator's context — so the coordinator can see why a specialist flagged something, not just what it flagged.

Install

git clone https://github.com/m74567437-maker/agent-code-reviewer.git
cd agent-code-reviewer
pip install -e .

Or with pip install -e ".[dev]" to get test dependencies.

Configure

Copy .env.example to .env and fill in:

LLM_API_KEY=sk-...
LLM_BASE_URL=https://api.openai.com/v1   # or any compatible endpoint
LLM_MODEL=gpt-4o-mini

Works out-of-the-box with any provider that ships /v1/chat/completions:

Provider LLM_BASE_URL Sample LLM_MODEL
OpenAI https://api.openai.com/v1 gpt-4o-mini
Xiaomi MiMo (per the MiMo open platform docs) mimo-7b
DeepSeek https://api.deepseek.com/v1 deepseek-chat
Moonshot https://api.moonshot.cn/v1 moonshot-v1-8k
Ollama http://localhost:11434/v1 qwen2.5-coder
vLLM http://localhost:8000/v1 (whatever you served)

Use

# Review a file, print to terminal
agent-review review examples/sample_buggy.py

# Markdown report, ready to paste into a PR
agent-review review src/foo.py -f md > review.md

# JSON for CI integration
agent-review review src/foo.py -f json -o review.json

# Only specific agents
agent-review review src/foo.py --enable security --enable performance

# Stdin
cat src/foo.py | agent-review review --stdin -l python

# Fail the build on high or critical findings
agent-review review src/foo.py --fail-on high

List available agents

$ agent-review agents

Output (terminal)

─────────────────────────── examples/sample_buggy.py (python) ───────────────────────────

Overall severity: HIGH  •  Total findings: 7  •  Tokens: 4318  •  Duration: 6234ms

╭─ Executive Summary ─────────────────────────────────────────────────────╮
│ Two critical issues block deployment: a SQL injection on line 14 and a │
│ hardcoded API key on line 7. Performance is fine for the current scale │
│ but the N+1 pattern on line 22 will bite at >1k records.               │
╰─────────────────────────────────────────────────────────────────────────╯

                          Prioritized Actions
┏━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ # ┃ Severity  ┃ Title                        ┃ Sources                 ┃
┡━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 1 │ CRITICAL  │ SQL injection via raw f-str  │ security                │
│ 2 │ CRITICAL  │ Hardcoded API key            │ security                │
│ 3 │ HIGH      │ N+1 DB query in loop         │ performance, architecture│
│ 4 │ MEDIUM    │ Missing edge cases in tests  │ test_coverage           │
└───┴───────────┴──────────────────────────────┴─────────────────────────┘

Architecture

See docs/ARCHITECTURE.md for the design walkthrough — agent contracts, the two-stage prompting protocol, why the coordinator gets the full reasoning trace, and how to add your own agent.

Add a custom agent

Subclass BaseAgent, implement system_prompt() and reasoning_prompt(), register it:

# my_agent.py
from agent_reviewer.agents.base import BaseAgent, ReviewContext

class DocsAgent(BaseAgent):
    name = "docs"
    category = "documentation"

    def system_prompt(self) -> str:
        return "You are a docs reviewer. Flag missing or misleading docstrings."

    def reasoning_prompt(self, ctx: ReviewContext) -> str:
        return f"Review docs in this {ctx.language} file...\n```\n{ctx.source}\n```"

Then:

from agent_reviewer.agents import AGENT_REGISTRY
AGENT_REGISTRY["docs"] = DocsAgent

Roadmap

  • Diff-aware mode (--diff HEAD~1) so reviews focus on what actually changed
  • GitHub Action so the report shows up as a PR comment
  • Repo-level review across multiple files with cross-file reasoning
  • Auto-fix mode — feed the suggested fixes back into a Patch Agent
  • Caching layer keyed on (file content hash, agent, model) to skip re-reviews

License

MIT. See LICENSE.

About

Multi-agent code review CLI with long-chain reasoning. Five specialist agents (security, performance, style, architecture, test) review in parallel; a coordinator synthesizes findings into one prioritized action list. Works with any OpenAI-compatible endpoint.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages