Skip to content

AmitaKashid/agentmesh

Repository files navigation

AgentMesh — Multi-Agent Orchestration with MCP & A2A

A production-style multi-agent orchestration prototype where a planner agent decomposes complex requests, discovers specialised agents through Agent Cards, delegates tasks through an A2A-inspired protocol, and gives agents uniform access to tools through MCP-style servers.

Python FastAPI MCP A2A AWS License


Why AgentMesh exists

Single-agent assistants are good at simple requests, but they often become brittle when a task needs planning, retrieval, data lookup, web-style search, evidence synthesis, and final summarisation. AgentMesh demonstrates a cleaner architecture:

  • one Planner Agent decomposes the user request,
  • specialised agents execute sub-tasks,
  • agents publish Agent Cards for discovery,
  • delegation happens through an A2A-style task interface,
  • tools are exposed through MCP-style servers,
  • evaluation compares single-agent and multi-agent execution on multi-step tasks.

The goal is not to imitate every line of a protocol specification. The goal is to show a practical engineering pattern that recruiters and engineering teams can understand: separate agent communication from tool execution.


What this project demonstrates

Capability What AgentMesh implements
Agent discovery Agent Cards served from a registry and API endpoint
A2A-style delegation Task envelopes, task states, agent messages, artifacts, and trace IDs
MCP-style tool access Uniform list_tools and call_tool interface for SQL, vector search, and REST/mock search
Multi-agent orchestration Planner delegates to retrieval, search, SQL, and summarisation agents
Observability Step-level traces, tool calls, latency, agent routing decisions
Evaluation Single-agent vs multi-agent comparison on 50+ multi-step tasks
Cloud-readiness AWS SAM/Terraform skeleton for API Gateway, Lambda-style agents, DynamoDB state, Bedrock model calls

Architecture

flowchart LR
    U[User Request] --> API[FastAPI Gateway]
    API --> P[Planner Agent]
    P --> REG[Agent Registry / Agent Cards]
    REG --> R[Retrieval Agent]
    REG --> S[Search Agent]
    REG --> Q[SQL Agent]
    REG --> SUM[Summarisation Agent]
    R --> MCP1[MCP Vector Search Server]
    S --> MCP2[MCP REST/Search Server]
    Q --> MCP3[MCP SQL Server]
    R --> SUM
    S --> SUM
    Q --> SUM
    SUM --> API
    API --> OUT[Final Answer + Trace]
Loading

Demo: multi-agent run

agentmesh demo "Which enterprise customers had open critical tickets, what internal docs explain the fix, and what should the account manager send them?"

Example result:

Planner created 4 steps:
1. Query SQL tickets and accounts
2. Retrieve internal incident/runbook knowledge
3. Search external-style product status snippets
4. Summarise customer-facing action plan

Final answer:
Two enterprise customers have unresolved critical issues. The strongest remediation evidence is in the cache invalidation runbook and the API timeout incident note. The account manager should send a short update acknowledging impact, explaining the mitigation window, and offering a technical follow-up.

Evaluation snapshot

The repository includes a deterministic evaluation set with 50 multi-step tasks. The benchmark compares a single generic agent against the routed multi-agent system.

System Tool-selection accuracy Task completion Avg. steps Avg. latency
Single generic agent 62.0% 58.0% 2.1 0.31s
AgentMesh multi-agent 88.0% 84.0% 3.7 0.46s

Interpretation: the multi-agent system takes slightly more steps, but it selects tools more accurately and completes compound requests more reliably. The included evaluator is deterministic so the project can be run without paid LLM APIs.


Repository structure

agentmesh/
├── src/agentmesh/
│   ├── api.py                    # FastAPI gateway
│   ├── cli.py                    # CLI for demo, run, evaluate
│   ├── orchestrator.py           # Planner-led multi-agent orchestration
│   ├── state.py                  # In-memory/DynamoDB-style task state
│   ├── schemas.py                # Shared data contracts
│   ├── agents/
│   │   ├── base.py               # Base agent contract
│   │   ├── planner.py            # Planner / task decomposer
│   │   ├── retrieval_agent.py    # Knowledge retrieval specialist
│   │   ├── search_agent.py       # External-search specialist
│   │   ├── sql_agent.py          # Structured data specialist
│   │   └── summarizer_agent.py   # Final synthesis specialist
│   ├── a2a/
│   │   ├── cards.py              # Agent Card definitions
│   │   ├── registry.py           # Agent discovery registry
│   │   └── protocol.py           # A2A-style task/message envelopes
│   ├── mcp/
│   │   ├── server.py             # MCP-style server interface
│   │   └── client.py             # MCP-style client interface
│   ├── tools/
│   │   ├── sql_tools.py          # SQLite tool server
│   │   ├── vector_tools.py       # TF-IDF knowledge retrieval tool server
│   │   └── rest_tools.py         # Mock REST/search/status tool server
│   ├── evaluation/
│   │   ├── dataset.py            # Generates/evaluates 50+ tasks
│   │   └── evaluator.py          # Single vs multi-agent scoring
│   └── cloud/
│       ├── bedrock_client.py     # Bedrock-compatible abstraction
│       └── dynamodb_state.py     # DynamoDB-state adapter skeleton
├── data/
│   ├── knowledge/                # Demo runbooks and incident notes
│   └── eval/                     # Multi-step evaluation dataset
├── docs/
│   ├── ARCHITECTURE.md
│   ├── MCP_A2A_DESIGN.md
│   ├── EVALUATION.md
│   └── AWS_DEPLOYMENT.md
├── infra/
│   ├── aws-sam/template.yaml
│   └── terraform/main.tf
├── tests/
├── reports/benchmark_summary.md
├── Dockerfile
├── docker-compose.yml
├── Makefile
└── README.md

Quick start

1. Clone

git clone https://github.com/<your-username>/agentmesh.git
cd agentmesh

2. Create environment

python -m venv .venv
source .venv/bin/activate      # macOS/Linux
# .venv\Scripts\activate       # Windows PowerShell

3. Install

pip install -e ".[dev]"

4. Run local demo

agentmesh demo "Which customers have critical open tickets and what should we tell them?"

5. Start API

uvicorn agentmesh.api:app --reload

Open the interactive API docs:

http://localhost:8000/docs

API example

curl -X POST http://localhost:8000/run \
  -H "Content-Type: application/json" \
  -d '{
    "request": "Find critical customer issues, retrieve the right runbook, and draft an account-manager summary.",
    "mode": "multi_agent"
  }'

Response shape:

{
  "task_id": "task_...",
  "mode": "multi_agent",
  "final_answer": "...",
  "steps": [...],
  "tool_calls": [...],
  "metrics": {
    "latency_seconds": 0.46,
    "agents_used": 4,
    "tools_called": 5
  }
}

Agent Cards

Each agent publishes a card with capabilities, input/output modes, and skills.

{
  "name": "retrieval-agent",
  "description": "Retrieves relevant internal knowledge from vector-search tools.",
  "skills": [
    {
      "id": "retrieve_knowledge",
      "name": "Retrieve Knowledge",
      "description": "Finds relevant runbooks, incidents, and policy snippets."
    }
  ]
}

This makes routing explicit: the planner does not need hard-coded implementation details. It can discover agent capabilities and delegate based on task intent.


MCP-style tool servers

AgentMesh exposes tools behind a uniform interface:

tools = await mcp_client.list_tools(server="sql")
result = await mcp_client.call_tool(
    server="sql",
    tool="query_tickets",
    arguments={"severity": "critical", "status": "open"}
)

Included tool servers:

MCP-style server Tools
sql query_tickets, query_accounts, list_tables
vector retrieve_documents, list_documents
rest search_status, get_service_health, fetch_competitor_signal

Evaluation

Run the benchmark:

agentmesh evaluate --output reports/eval_results.csv

The evaluator measures:

  • tool-selection accuracy
  • agent-routing accuracy
  • task completion
  • average step count
  • latency
  • unnecessary-tool rate

The project intentionally includes both a single-agent baseline and the multi-agent orchestrator so the improvement is measurable instead of only described.


AWS deployment design

The local architecture maps cleanly to AWS:

Local component AWS equivalent
FastAPI gateway API Gateway + Lambda adapter
Agents Lambda functions
Shared state DynamoDB
Tool calls MCP server Lambdas / internal services
LLM reasoning Amazon Bedrock foundation models
Logs/traces CloudWatch Logs

The repository includes both AWS SAM and Terraform starter infrastructure. It is intentionally lightweight, so it can be reviewed as architecture without requiring cloud credentials.


Design principles

  1. Agent communication and tool execution are separate concerns.
  2. Agent discovery should be metadata-driven, not hard-coded.
  3. Tools should be added without rewriting agent logic.
  4. Planning should be observable, not hidden in one opaque prompt.
  5. Evaluation should compare architectures, not just model outputs.

Recommended GitHub topics

multi-agent
mcp
a2a
agentic-ai
agent-orchestration
fastapi
aws-bedrock
lambda
dynamodb
rag
python
llmops

Roadmap

  • Replace in-process A2A transport with HTTP-to-HTTP agent calls
  • Add signed Agent Cards and capability allowlists
  • Add true MCP SDK server implementations
  • Add Bedrock Converse API execution path
  • Add LangGraph backend adapter
  • Add distributed tracing with OpenTelemetry
  • Add React trace viewer
  • Add human approval step for sensitive tool calls

Responsible use

AgentMesh is a prototype for agent orchestration and evaluation. It should not be connected to sensitive enterprise tools without authentication, authorization, audit logging, tool allowlists, input validation, and human approval for high-impact actions.


License

MIT License.

About

Multi-agent orchestration system using MCP-style tool servers and A2A-style Agent Cards to coordinate planner, retrieval, search, and summarisation agents.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages