Discovery-First Agents

Research implementation for "Discovery-First Agents: Dynamic vs Static Tool Discovery in Large Language Model Systems".

Overview

This project evaluates a discovery-first agent architecture that dynamically discovers tools at runtime instead of statically enumerating them. We compare static, dynamic, cached, batched, and hybrid agents across multiple dimensions: context size, interaction turns, task success, and scalability.

Key Findings

86.3% context reduction at 15 tools (GPT-4o-mini)
97.2% context reduction at 100 tools
O(1) scalability vs O(n) for static agents
66% context reduction with hybrid approach
45.7% F1 improvement with semantic search

Relationship to MCP (Model Context Protocol)

This work is complementary to MCP, not competitive with it.

MCP (Anthropic) standardizes the transport layer—how tools are defined, invoked, and communicated. However, MCP does not address discovery architecture: MCP servers currently expose all registered tools to connected clients, recreating the static exposure problem at the protocol level.

Concern	MCP	This Work
Tool schema format	✓	-
Invocation protocol	✓	-
Transport mechanism	✓	-
When to expose tools	-	✓
Which tools to expose	-	✓
Context optimization	-	✓
Access control	-	✓

MCP answers: "How do I call a tool?"

This research answers: "Which tools should the agent see, and when?"

A Capability Registry could be implemented as an MCP server that responds to SEARCH and GET_SCHEMA requests, providing dynamic discovery while maintaining MCP compatibility.

How Tool Calling Works: Static vs Discovery-First

Current Approach: Static Tool Exposure (With or Without MCP)

Most LLM agent frameworks (LangChain, AutoGen, CAMEL) use static tool exposure. This is true whether you use MCP (Model Context Protocol) or not—MCP standardizes the transport layer (how tools are defined and invoked), but tools are still loaded all at once into the prompt:

┌─────────────────────────────────────────────────────────────────┐
│                     STATIC AGENT FLOW                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. INITIALIZATION                                              │
│     ┌─────────────┐                                             │
│     │ Load ALL    │  → 15 tools = ~3,000 tokens                 │
│     │ tool schemas│  → 100 tools = ~20,000 tokens               │
│     │ into prompt │  → Context grows O(n)                       │
│     └─────────────┘                                             │
│            ↓                                                    │
│  2. SYSTEM PROMPT (sent with EVERY request)                     │
│     ┌─────────────────────────────────────────────────────────┐ │
│     │ "You have access to the following tools:                │ │
│     │                                                         │ │
│     │  TOOL: load_csv                                         │ │
│     │  DESCRIPTION: Load a CSV file into a DataFrame          │ │
│     │  PARAMETERS: {file_path: string}                        │ │
│     │                                                         │ │
│     │  TOOL: calculate_mean                                   │ │
│     │  DESCRIPTION: Calculate mean of a column                │ │
│     │  PARAMETERS: {column: string}                           │ │
│     │                                                         │ │
│     │  ... (repeat for ALL 15-100+ tools)                     │ │
│     │                                                         │ │
│     │  To use a tool: TOOL: name, ARGS: {...}"                │ │
│     └─────────────────────────────────────────────────────────┘ │
│            ↓                                                    │
│  3. TASK EXECUTION                                              │
│     User: "What is the average salary in employees.csv?"        │
│            ↓                                                    │
│     LLM sees ALL tools → picks load_csv → executes              │
│            ↓                                                    │
│     LLM sees ALL tools again → picks calculate_mean → executes  │
│            ↓                                                    │
│     COMPLETE: "The average salary is $75,000"                   │
│                                                                 │
│  PROBLEM: Every turn includes ~3,000+ tokens of tool schemas    │
│           that may never be used. At 100 tools: ~20,000 tokens! │
└─────────────────────────────────────────────────────────────────┘

Problems with Static Approach:

Context grows O(n) with tool count
Irrelevant tools consume tokens (paying for tools you don't use)
Context window limits scalability (can't add 500 tools)
No permission control (agent sees everything)

Our Approach: Discovery-First Architecture

The discovery-first architecture separates discovery from execution:

┌─────────────────────────────────────────────────────────────────┐
│                   DISCOVERY-FIRST FLOW                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. INITIALIZATION (minimal context)                            │
│     ┌─────────────┐                                             │
│     │ Load only   │  → ~200 tokens (fixed)                      │
│     │ discovery   │  → Context stays O(1)                       │
│     │ commands    │  → Scales to 1000+ tools                    │
│     └─────────────┘                                             │
│            ↓                                                    │
│  2. DISCOVERY PHASE (DiscoveryAgent)                            │
│     ┌─────────────────────────────────────────────────────────┐ │
│     │ System: "You can discover tools using:                  │ │
│     │   SEARCH: <capability needed>                           │ │
│     │   GET_SCHEMA: <tool_name>                               │ │
│     │   READY: <when done discovering>"                       │ │
│     └─────────────────────────────────────────────────────────┘ │
│            ↓                                                    │
│     User: "What is the average salary in employees.csv?"        │
│            ↓                                                    │
│     LLM: "SEARCH: load csv data"                                │
│            ↓                                                    │
│     ┌──────────────────┐                                        │
│     │ Capability       │  → Returns: [load_csv, read_excel]     │
│     │ Registry         │    (just names + descriptions)         │
│     └──────────────────┘                                        │
│            ↓                                                    │
│     LLM: "GET_SCHEMA: load_csv"                                 │
│            ↓                                                    │
│     Registry returns full schema (only when requested!)         │
│            ↓                                                    │
│     LLM: "SEARCH: calculate statistics"                         │
│            ↓                                                    │
│     LLM: "GET_SCHEMA: calculate_mean"                           │
│            ↓                                                    │
│     LLM: "READY: load_csv, calculate_mean"                      │
│                                                                 │
│  3. EXECUTION PHASE (ToolExecutor)                              │
│     ┌─────────────────────────────────────────────────────────┐ │
│     │ System: "You have these tools:                          │ │
│     │   TOOL: load_csv (full schema)                          │ │
│     │   TOOL: calculate_mean (full schema)                    │ │
│     │   ... only 2 tools instead of 100!"                     │ │
│     └─────────────────────────────────────────────────────────┘ │
│            ↓                                                    │
│     LLM: "TOOL: load_csv, ARGS: {file_path: 'employees.csv'}"   │
│            ↓                                                    │
│     LLM: "TOOL: calculate_mean, ARGS: {column: 'salary'}"       │
│            ↓                                                    │
│     COMPLETE: "The average salary is $75,000"                   │
│                                                                 │
│  RESULT: Only relevant tools loaded. 97% context reduction!     │
└─────────────────────────────────────────────────────────────────┘

Benefits of Discovery-First:

Context stays O(1) regardless of tool count
Only pay for tools you actually use
Scales to 1000+ tools without hitting context limits
Enables permission-based tool access

Side-by-Side Comparison

              STATIC AGENT                    DISCOVERY-FIRST AGENT
              ────────────                    ─────────────────────

Turn 1:       [3000 tokens of tools]          [200 tokens discovery]
              + task                          + task
              ────────────────────            ────────────────────
              ~3100 tokens                    ~300 tokens

Turn 2:       [3000 tokens of tools]          "SEARCH: load csv"
              + task + result                 → [load_csv, read_excel]
              ────────────────────            ────────────────────
              ~3200 tokens                    ~400 tokens

Turn 3:       [3000 tokens of tools]          "GET_SCHEMA: load_csv"
              + history                       → (schema loaded)
              ────────────────────            ────────────────────
              ~3400 tokens                    ~600 tokens

...           ...                             ...

Execution:    Same context                    [400 tokens: 2 tools]
              throughout                      + task
                                              ────────────────────
                                              ~500 tokens

TOTAL:        ~15,000 tokens                  ~2,000 tokens
              (5 turns × 3000)                (discovery + execution)

Architecture

┌─────────────────┐     ┌─────────────────┐
│  Static Agent   │     │  Dynamic Agent  │
│  (All tools     │     │  (Discovery +   │
│   upfront)      │     │   Execution)    │
│                 │     │                 │
│  Context: O(n)  │     │  Context: O(1)  │
└─────────────────┘     └─────────────────┘
                              │
                    ┌─────────┴─────────┐
                    │                   │
              ┌─────▼─────┐      ┌──────▼──────┐
              │ Discovery │      │    Tool     │
              │   Agent   │      │  Executor   │
              │ (SEARCH,  │      │ (Execution) │
              │ GET_SCHEMA│      │             │
              │ READY)    │      │             │
              └───────────┘      └─────────────┘

Authors

Paritosh Baghel, Jake Dulin

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
discovery-first-agents.pdf		discovery-first-agents.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Discovery-First Agents

Overview

Key Findings

Relationship to MCP (Model Context Protocol)

How Tool Calling Works: Static vs Discovery-First

Current Approach: Static Tool Exposure (With or Without MCP)

Our Approach: Discovery-First Architecture

Side-by-Side Comparison

Architecture

Authors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Discovery-First Agents

Overview

Key Findings

Relationship to MCP (Model Context Protocol)

How Tool Calling Works: Static vs Discovery-First

Current Approach: Static Tool Exposure (With or Without MCP)

Our Approach: Discovery-First Architecture

Side-by-Side Comparison

Architecture

Authors

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages