Cross-Talk

A methodology for structured cross-agent review — using multiple AI models to catch what no single model catches alone.

A single AI agent analyzing a problem produces reasonable output. Two agents from different providers analyzing the same problem, then reviewing each other's work, produce output that is more thorough, more balanced, and catches blind spots that neither finds alone.

Cross-Talk is the protocol for making that happen reliably.

The $2 Senior Engineer

In a real project, Claude proposed a minimum-velocity filter that would have silently deleted real musical notes (quiet pianissimo passages). Codex caught this as a violation of the project's constitutional guarantees.

Cost of the review: ~$2 in tokens
Cost of the bug it prevented: user trust, potentially unrecoverable

That's the value proposition. Senior-engineer-quality review at API prices.

How It Works

Phase 1: DIVERGE        Same task -> Agent A + Agent B independently (can't see each other)
Phase 2: CROSS-REVIEW   Each reviews the other's output
Phase 3: CONVERGE       Resolve disagreements into one final artifact
Phase 4: IMPLEMENT      Split work, cross-review the code too

Different models have different training data, different reasoning patterns, different blind spots. Agreement builds confidence. Disagreement marks exactly where the interesting engineering decisions live. Both outcomes are valuable. Neither is possible with a single agent.

Quick Start

No framework required. No code to install. You need:

Two AI agents from different model families (e.g., Claude + Codex, Claude + GPT, Claude + Gemini)
A shared context folder with project docs both agents can read
A specific question narrow enough for a concrete artifact (not "make it better" but "classify these 6 gaps as safe-fix vs. editorial-choice")
An artifact contract defining output format upfront so results are comparable

Then follow the step-by-step protocol.

When to Use It

Ask: "Would I want a second opinion from a senior engineer before shipping this?"

Use Cross-Talk	Don't use Cross-Talk
Architecture decisions	Typo fixes
Safety/trust-sensitive changes	Log line additions
Ambiguous requirements	Style/formatting changes
Risky PRs before merge	Mechanical refactors
Compliance/regulatory review	Well-understood bug fixes

What It Costs

Approach	Cost	Quality
Single agent, one pass	$0.30-0.50	Good
Single agent, self-review	$0.60-1.00	Better
Cross-Talk standard	$1.50-2.50	Best among low-cost AI-only options
Human senior engineer review	$200-400	Potentially best, context-dependent

Full cost breakdown and optimization strategies in Cost & Token Economics.

The Five Principles

Diverge Before You Converge -- Get independent analysis before combining. Never let Agent B see Agent A's output during generation.
Frame Tasks, Not Agents -- Task clarity matters more than model choice. A well-framed task given to any capable model beats a vague task given to "the best" model.
Review the Disagreements -- Agreements validate. Disagreements illuminate. The places where agents diverge are where the most interesting decisions live.
Artifacts Over Chat -- Every analysis produces a durable document, not conversation. Documents can be reviewed, diffed, archived, and handed to future agents.
Know When Not to Use It -- Cross-Talk costs 3-5x a single-agent pass. Use it for judgment calls, not mechanical work.

Patterns That Work

Shared Contract -- Both agents get the same project principles. Disagreement stays productive because it's grounded in shared reference material.
Narrow Question, Structured Output -- Specific questions produce comparable artifacts. Vague questions produce incomparable ones.
Classification Before Implementation -- Separate "clearly bugs" from "judgment calls" from "feature requests" before writing any code.
File-Based Handoff -- Agents communicate through markdown files in the repo, not chat. Chat is ephemeral. Files survive session boundaries.
The "Changed My Mind" Flag -- Agents explicitly flag where they changed position after review. These are the highest-value outputs.

Full patterns and anti-patterns in Patterns.

Anti-Patterns to Avoid

Same model twice -- Same model = same blind spots. You need actual model diversity.
Letting agents peek -- If Agent B sees Agent A's work during generation, you get anchoring bias instead of independence.
Converging by averaging -- "Agent A says 40, Agent B says 60, let's use 50" is avoidance, not convergence. Understand why they disagree.
Using it for everything -- Not every change needs a $2 review. The second-opinion threshold exists for a reason.
Review in chat -- Chat is session-scoped. When the implementer starts a new session, the findings are gone.

Case Studies

Mozartino: Transcription Pipeline Review

Full 4-phase protocol. Claude and Codex independently analyzed 6 transcription quality gaps in a music notation app. Cross-review caught a constitutionally-violating velocity filter. Two Codex-originated ideas (transformation diagnostics, anti-overcleaning guard) were adopted into the final plan.

Mozartino: PR 2 Implementation Review

Lightweight 2-pass variation. Claude implemented, Codex reviewed. Caught an octave-leap false positive that the implementer's own tests missed (constitutional severity). Also discovered that chat-based review doesn't work -- led to the file-based handoff protocol.

How It Compares

	Cross-Talk	AutoGen / CrewAI	Academic LLM Debate	adversarial-review
Different providers required	Yes	Optional	Usually no	Yes
Artifact-based (files, not chat)	Yes	No	No	Yes
Prescriptive protocol	Yes	Build your own	Theoretical	Partial
Failure modes documented	Yes	No	No	No
Cost economics included	Yes	No	No	No
Real case studies with costs	Yes	No	Benchmarks only	No
No code/framework required	Yes	Code required	Code required	Code required
When NOT to use it	Yes	No	No	No

Cross-Talk is the methodology layer that sits above orchestration frameworks. AutoGen gives you plumbing. CrewAI gives you roles. Cross-Talk tells you what to do, why, and when not to.

Agent Profiles

Phase	Often a good fit	Why
Architecture & tradeoffs	Claude	Principle-based reasoning
Implementation	Codex	Token-efficient, precise execution
Adversarial review	Claude	Catches conceptual issues
Test generation	Codex	Pattern-following, mechanical
Safety/constitutional review	Claude	"Should we?" reasoning
Parallel coding	Codex	Worktree isolation model

These are starting heuristics, not rules. Task framing matters more than model choice. Full profiles in Agent Profiles.

Can I Automate It?

Yes. The protocol is manual by default (two terminals, shared folder), but can be automated with:

LangGraph -- Define agents as nodes, review loops as edges
Anthropic Agent SDK / OpenAI Agents SDK -- Custom multi-agent pipelines
MCP -- Shared tool definitions across agent providers
Claude Code sub-agents -- Fan-out implementation after convergence

See Tooling Landscape for the full ecosystem map.

Project Structure

docs/
  MANIFESTO.md              Core principles -- why cross-agent review works
  METHODOLOGY.md            Step-by-step protocol with artifact templates
  PATTERNS.md               Patterns that work + anti-patterns to avoid
  AGENT_PROFILES.md         Agent strengths and task assignment guide
  COST_AND_TOKENS.md        Token economics and optimization strategies
  TOOLING_LANDSCAPE.md      Current tools and ecosystem
  examples/
    mozartino-*.md           Real-world case studies

Read Order

This README (you're here)
Manifesto -- understand why
Methodology -- learn the protocol
Patterns -- learn what works and what doesn't
Examples -- see it applied to a real project

Related Work

Cross-Talk was developed independently from hands-on project experience. We later found strong alignment with academic research:

Irving et al. 2018 — "AI Safety via Debate" (the foundational paper, co-authored by Dario Amodei)
Du et al. 2023 — Multiagent debate reduces hallucinations and improves reasoning (ICML 2024)
A-HMAD 2025 — Heterogeneous model debate produces 30%+ fewer errors (directly validates the cross-model thesis)
ChatEval 2023 — Diverse agent roles outperform single-judge evaluation (ICLR 2024)
D3 2024 — Cost-aware debate; 3-7 agents is the sweet spot

The Ralph Wiggum Technique influenced the artifact-based persistence approach.

Full discussion in Manifesto — Related Work.

Status

Early-stage methodology, born from real multi-agent work on one production application (2026). N=1, but the patterns are replicable, the academic foundation is strong, and the case study includes real costs and real bugs caught. More case studies welcome — that's the fastest way to strengthen this.

Contributions, case studies, and tooling experiments welcome.

Contributing

Keep methodology prescriptive and concise
All claims should reference real experience or cited sources
Case studies from real projects are welcome (anonymize if needed)
Date-stamp research findings -- tooling changes fast
See CLAUDE.md for agent contribution guidelines

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
docs		docs
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross-Talk

The $2 Senior Engineer

How It Works

Quick Start

When to Use It

What It Costs

The Five Principles

Patterns That Work

Anti-Patterns to Avoid

Case Studies

Mozartino: Transcription Pipeline Review

Mozartino: PR 2 Implementation Review

How It Compares

Agent Profiles

Can I Automate It?

Project Structure

Read Order

Related Work

Status

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Cross-Talk

The $2 Senior Engineer

How It Works

Quick Start

When to Use It

What It Costs

The Five Principles

Patterns That Work

Anti-Patterns to Avoid

Case Studies

Mozartino: Transcription Pipeline Review

Mozartino: PR 2 Implementation Review

How It Compares

Agent Profiles

Can I Automate It?

Project Structure

Read Order

Related Work

Status

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages