Research implementation for "Discovery-First Agents: Dynamic vs Static Tool Discovery in Large Language Model Systems".
This project evaluates a discovery-first agent architecture that dynamically discovers tools at runtime instead of statically enumerating them. We compare static, dynamic, cached, batched, and hybrid agents across multiple dimensions: context size, interaction turns, task success, and scalability.
- 86.3% context reduction at 15 tools (GPT-4o-mini)
- 97.2% context reduction at 100 tools
- O(1) scalability vs O(n) for static agents
- 66% context reduction with hybrid approach
- 45.7% F1 improvement with semantic search
This work is complementary to MCP, not competitive with it.
MCP (Anthropic) standardizes the transport layer—how tools are defined, invoked, and communicated. However, MCP does not address discovery architecture: MCP servers currently expose all registered tools to connected clients, recreating the static exposure problem at the protocol level.
| Concern | MCP | This Work |
|---|---|---|
| Tool schema format | ✓ | - |
| Invocation protocol | ✓ | - |
| Transport mechanism | ✓ | - |
| When to expose tools | - | ✓ |
| Which tools to expose | - | ✓ |
| Context optimization | - | ✓ |
| Access control | - | ✓ |
MCP answers: "How do I call a tool?"
This research answers: "Which tools should the agent see, and when?"
A Capability Registry could be implemented as an MCP server that responds to SEARCH and GET_SCHEMA requests, providing dynamic discovery while maintaining MCP compatibility.
Most LLM agent frameworks (LangChain, AutoGen, CAMEL) use static tool exposure. This is true whether you use MCP (Model Context Protocol) or not—MCP standardizes the transport layer (how tools are defined and invoked), but tools are still loaded all at once into the prompt:
┌─────────────────────────────────────────────────────────────────┐
│ STATIC AGENT FLOW │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. INITIALIZATION │
│ ┌─────────────┐ │
│ │ Load ALL │ → 15 tools = ~3,000 tokens │
│ │ tool schemas│ → 100 tools = ~20,000 tokens │
│ │ into prompt │ → Context grows O(n) │
│ └─────────────┘ │
│ ↓ │
│ 2. SYSTEM PROMPT (sent with EVERY request) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ "You have access to the following tools: │ │
│ │ │ │
│ │ TOOL: load_csv │ │
│ │ DESCRIPTION: Load a CSV file into a DataFrame │ │
│ │ PARAMETERS: {file_path: string} │ │
│ │ │ │
│ │ TOOL: calculate_mean │ │
│ │ DESCRIPTION: Calculate mean of a column │ │
│ │ PARAMETERS: {column: string} │ │
│ │ │ │
│ │ ... (repeat for ALL 15-100+ tools) │ │
│ │ │ │
│ │ To use a tool: TOOL: name, ARGS: {...}" │ │
│ └─────────────────────────────────────────────────────────┘ │
│ ↓ │
│ 3. TASK EXECUTION │
│ User: "What is the average salary in employees.csv?" │
│ ↓ │
│ LLM sees ALL tools → picks load_csv → executes │
│ ↓ │
│ LLM sees ALL tools again → picks calculate_mean → executes │
│ ↓ │
│ COMPLETE: "The average salary is $75,000" │
│ │
│ PROBLEM: Every turn includes ~3,000+ tokens of tool schemas │
│ that may never be used. At 100 tools: ~20,000 tokens! │
└─────────────────────────────────────────────────────────────────┘
Problems with Static Approach:
- Context grows O(n) with tool count
- Irrelevant tools consume tokens (paying for tools you don't use)
- Context window limits scalability (can't add 500 tools)
- No permission control (agent sees everything)
The discovery-first architecture separates discovery from execution:
┌─────────────────────────────────────────────────────────────────┐
│ DISCOVERY-FIRST FLOW │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. INITIALIZATION (minimal context) │
│ ┌─────────────┐ │
│ │ Load only │ → ~200 tokens (fixed) │
│ │ discovery │ → Context stays O(1) │
│ │ commands │ → Scales to 1000+ tools │
│ └─────────────┘ │
│ ↓ │
│ 2. DISCOVERY PHASE (DiscoveryAgent) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ System: "You can discover tools using: │ │
│ │ SEARCH: <capability needed> │ │
│ │ GET_SCHEMA: <tool_name> │ │
│ │ READY: <when done discovering>" │ │
│ └─────────────────────────────────────────────────────────┘ │
│ ↓ │
│ User: "What is the average salary in employees.csv?" │
│ ↓ │
│ LLM: "SEARCH: load csv data" │
│ ↓ │
│ ┌──────────────────┐ │
│ │ Capability │ → Returns: [load_csv, read_excel] │
│ │ Registry │ (just names + descriptions) │
│ └──────────────────┘ │
│ ↓ │
│ LLM: "GET_SCHEMA: load_csv" │
│ ↓ │
│ Registry returns full schema (only when requested!) │
│ ↓ │
│ LLM: "SEARCH: calculate statistics" │
│ ↓ │
│ LLM: "GET_SCHEMA: calculate_mean" │
│ ↓ │
│ LLM: "READY: load_csv, calculate_mean" │
│ │
│ 3. EXECUTION PHASE (ToolExecutor) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ System: "You have these tools: │ │
│ │ TOOL: load_csv (full schema) │ │
│ │ TOOL: calculate_mean (full schema) │ │
│ │ ... only 2 tools instead of 100!" │ │
│ └─────────────────────────────────────────────────────────┘ │
│ ↓ │
│ LLM: "TOOL: load_csv, ARGS: {file_path: 'employees.csv'}" │
│ ↓ │
│ LLM: "TOOL: calculate_mean, ARGS: {column: 'salary'}" │
│ ↓ │
│ COMPLETE: "The average salary is $75,000" │
│ │
│ RESULT: Only relevant tools loaded. 97% context reduction! │
└─────────────────────────────────────────────────────────────────┘
Benefits of Discovery-First:
- Context stays O(1) regardless of tool count
- Only pay for tools you actually use
- Scales to 1000+ tools without hitting context limits
- Enables permission-based tool access
STATIC AGENT DISCOVERY-FIRST AGENT
──────────── ─────────────────────
Turn 1: [3000 tokens of tools] [200 tokens discovery]
+ task + task
──────────────────── ────────────────────
~3100 tokens ~300 tokens
Turn 2: [3000 tokens of tools] "SEARCH: load csv"
+ task + result → [load_csv, read_excel]
──────────────────── ────────────────────
~3200 tokens ~400 tokens
Turn 3: [3000 tokens of tools] "GET_SCHEMA: load_csv"
+ history → (schema loaded)
──────────────────── ────────────────────
~3400 tokens ~600 tokens
... ... ...
Execution: Same context [400 tokens: 2 tools]
throughout + task
────────────────────
~500 tokens
TOTAL: ~15,000 tokens ~2,000 tokens
(5 turns × 3000) (discovery + execution)
┌─────────────────┐ ┌─────────────────┐
│ Static Agent │ │ Dynamic Agent │
│ (All tools │ │ (Discovery + │
│ upfront) │ │ Execution) │
│ │ │ │
│ Context: O(n) │ │ Context: O(1) │
└─────────────────┘ └─────────────────┘
│
┌─────────┴─────────┐
│ │
┌─────▼─────┐ ┌──────▼──────┐
│ Discovery │ │ Tool │
│ Agent │ │ Executor │
│ (SEARCH, │ │ (Execution) │
│ GET_SCHEMA│ │ │
│ READY) │ │ │
└───────────┘ └─────────────┘
Paritosh Baghel, Jake Dulin
MIT