Skip to content

ollo12-prog/cross-talk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cross-Talk

A methodology for structured cross-agent review — using multiple AI models to catch what no single model catches alone.

A single AI agent analyzing a problem produces reasonable output. Two agents from different providers analyzing the same problem, then reviewing each other's work, produce output that is more thorough, more balanced, and catches blind spots that neither finds alone.

Cross-Talk is the protocol for making that happen reliably.

The $2 Senior Engineer

In a real project, Claude proposed a minimum-velocity filter that would have silently deleted real musical notes (quiet pianissimo passages). Codex caught this as a violation of the project's constitutional guarantees.

  • Cost of the review: ~$2 in tokens
  • Cost of the bug it prevented: user trust, potentially unrecoverable

That's the value proposition. Senior-engineer-quality review at API prices.

How It Works

Phase 1: DIVERGE        Same task -> Agent A + Agent B independently (can't see each other)
Phase 2: CROSS-REVIEW   Each reviews the other's output
Phase 3: CONVERGE       Resolve disagreements into one final artifact
Phase 4: IMPLEMENT      Split work, cross-review the code too

Different models have different training data, different reasoning patterns, different blind spots. Agreement builds confidence. Disagreement marks exactly where the interesting engineering decisions live. Both outcomes are valuable. Neither is possible with a single agent.

Quick Start

No framework required. No code to install. You need:

  1. Two AI agents from different model families (e.g., Claude + Codex, Claude + GPT, Claude + Gemini)
  2. A shared context folder with project docs both agents can read
  3. A specific question narrow enough for a concrete artifact (not "make it better" but "classify these 6 gaps as safe-fix vs. editorial-choice")
  4. An artifact contract defining output format upfront so results are comparable

Then follow the step-by-step protocol.

When to Use It

Ask: "Would I want a second opinion from a senior engineer before shipping this?"

Use Cross-Talk Don't use Cross-Talk
Architecture decisions Typo fixes
Safety/trust-sensitive changes Log line additions
Ambiguous requirements Style/formatting changes
Risky PRs before merge Mechanical refactors
Compliance/regulatory review Well-understood bug fixes

What It Costs

Approach Cost Quality
Single agent, one pass $0.30-0.50 Good
Single agent, self-review $0.60-1.00 Better
Cross-Talk standard $1.50-2.50 Best among low-cost AI-only options
Human senior engineer review $200-400 Potentially best, context-dependent

Full cost breakdown and optimization strategies in Cost & Token Economics.

The Five Principles

  1. Diverge Before You Converge -- Get independent analysis before combining. Never let Agent B see Agent A's output during generation.
  2. Frame Tasks, Not Agents -- Task clarity matters more than model choice. A well-framed task given to any capable model beats a vague task given to "the best" model.
  3. Review the Disagreements -- Agreements validate. Disagreements illuminate. The places where agents diverge are where the most interesting decisions live.
  4. Artifacts Over Chat -- Every analysis produces a durable document, not conversation. Documents can be reviewed, diffed, archived, and handed to future agents.
  5. Know When Not to Use It -- Cross-Talk costs 3-5x a single-agent pass. Use it for judgment calls, not mechanical work.

Patterns That Work

  • Shared Contract -- Both agents get the same project principles. Disagreement stays productive because it's grounded in shared reference material.
  • Narrow Question, Structured Output -- Specific questions produce comparable artifacts. Vague questions produce incomparable ones.
  • Classification Before Implementation -- Separate "clearly bugs" from "judgment calls" from "feature requests" before writing any code.
  • File-Based Handoff -- Agents communicate through markdown files in the repo, not chat. Chat is ephemeral. Files survive session boundaries.
  • The "Changed My Mind" Flag -- Agents explicitly flag where they changed position after review. These are the highest-value outputs.

Full patterns and anti-patterns in Patterns.

Anti-Patterns to Avoid

  • Same model twice -- Same model = same blind spots. You need actual model diversity.
  • Letting agents peek -- If Agent B sees Agent A's work during generation, you get anchoring bias instead of independence.
  • Converging by averaging -- "Agent A says 40, Agent B says 60, let's use 50" is avoidance, not convergence. Understand why they disagree.
  • Using it for everything -- Not every change needs a $2 review. The second-opinion threshold exists for a reason.
  • Review in chat -- Chat is session-scoped. When the implementer starts a new session, the findings are gone.

Case Studies

Full 4-phase protocol. Claude and Codex independently analyzed 6 transcription quality gaps in a music notation app. Cross-review caught a constitutionally-violating velocity filter. Two Codex-originated ideas (transformation diagnostics, anti-overcleaning guard) were adopted into the final plan.

Lightweight 2-pass variation. Claude implemented, Codex reviewed. Caught an octave-leap false positive that the implementer's own tests missed (constitutional severity). Also discovered that chat-based review doesn't work -- led to the file-based handoff protocol.

How It Compares

Cross-Talk AutoGen / CrewAI Academic LLM Debate adversarial-review
Different providers required Yes Optional Usually no Yes
Artifact-based (files, not chat) Yes No No Yes
Prescriptive protocol Yes Build your own Theoretical Partial
Failure modes documented Yes No No No
Cost economics included Yes No No No
Real case studies with costs Yes No Benchmarks only No
No code/framework required Yes Code required Code required Code required
When NOT to use it Yes No No No

Cross-Talk is the methodology layer that sits above orchestration frameworks. AutoGen gives you plumbing. CrewAI gives you roles. Cross-Talk tells you what to do, why, and when not to.

Agent Profiles

Phase Often a good fit Why
Architecture & tradeoffs Claude Principle-based reasoning
Implementation Codex Token-efficient, precise execution
Adversarial review Claude Catches conceptual issues
Test generation Codex Pattern-following, mechanical
Safety/constitutional review Claude "Should we?" reasoning
Parallel coding Codex Worktree isolation model

These are starting heuristics, not rules. Task framing matters more than model choice. Full profiles in Agent Profiles.

Can I Automate It?

Yes. The protocol is manual by default (two terminals, shared folder), but can be automated with:

  • LangGraph -- Define agents as nodes, review loops as edges
  • Anthropic Agent SDK / OpenAI Agents SDK -- Custom multi-agent pipelines
  • MCP -- Shared tool definitions across agent providers
  • Claude Code sub-agents -- Fan-out implementation after convergence

See Tooling Landscape for the full ecosystem map.

Project Structure

docs/
  MANIFESTO.md              Core principles -- why cross-agent review works
  METHODOLOGY.md            Step-by-step protocol with artifact templates
  PATTERNS.md               Patterns that work + anti-patterns to avoid
  AGENT_PROFILES.md         Agent strengths and task assignment guide
  COST_AND_TOKENS.md        Token economics and optimization strategies
  TOOLING_LANDSCAPE.md      Current tools and ecosystem
  examples/
    mozartino-*.md           Real-world case studies

Read Order

  1. This README (you're here)
  2. Manifesto -- understand why
  3. Methodology -- learn the protocol
  4. Patterns -- learn what works and what doesn't
  5. Examples -- see it applied to a real project

Related Work

Cross-Talk was developed independently from hands-on project experience. We later found strong alignment with academic research:

  • Irving et al. 2018 — "AI Safety via Debate" (the foundational paper, co-authored by Dario Amodei)
  • Du et al. 2023 — Multiagent debate reduces hallucinations and improves reasoning (ICML 2024)
  • A-HMAD 2025 — Heterogeneous model debate produces 30%+ fewer errors (directly validates the cross-model thesis)
  • ChatEval 2023 — Diverse agent roles outperform single-judge evaluation (ICLR 2024)
  • D3 2024 — Cost-aware debate; 3-7 agents is the sweet spot

The Ralph Wiggum Technique influenced the artifact-based persistence approach.

Full discussion in Manifesto — Related Work.

Status

Early-stage methodology, born from real multi-agent work on one production application (2026). N=1, but the patterns are replicable, the academic foundation is strong, and the case study includes real costs and real bugs caught. More case studies welcome — that's the fastest way to strengthen this.

Contributions, case studies, and tooling experiments welcome.

Contributing

  • Keep methodology prescriptive and concise
  • All claims should reference real experience or cited sources
  • Case studies from real projects are welcome (anonymize if needed)
  • Date-stamp research findings -- tooling changes fast
  • See CLAUDE.md for agent contribution guidelines

License

MIT

About

Structured cross-agent review methodology for AI-assisted development

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors