Skip to content

Commit 092cd89

Browse files
authored
Merge pull request #16 from MadBomber/develop
feat(mcp): add MCP server discovery (Phase 6)
2 parents bf487ff + 4924b6f commit 092cd89

8 files changed

Lines changed: 475 additions & 27 deletions

File tree

docs/guides/mcp-integration.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -345,6 +345,77 @@ Without a shared poller each client uses its own blocking `Timeout.timeout` call
345345
!!! note
346346
Only stdio clients are registered with the poller. SSE, WebSocket, and StreamableHTTP clients passed a `poller:` argument ignore it silently.
347347
348+
## Server Discovery
349+
350+
When a robot has many MCP servers configured, connecting to all of them upfront is wasteful — most servers will be irrelevant for any given user message. **Server Discovery** uses TF cosine similarity to select only the semantically relevant servers before the first `ensure_mcp_clients` call.
351+
352+
### Enabling Discovery
353+
354+
Add `description:` to each server config and set `mcp_discovery: true` on the robot:
355+
356+
```ruby
357+
robot = RobotLab.build(
358+
name: "assistant",
359+
system_prompt: "You are a helpful assistant.",
360+
mcp_discovery: true,
361+
mcp: [
362+
{
363+
name: "filesystem",
364+
description: "Read, write, and search local files and directories",
365+
transport: { type: "stdio", command: "mcp-server-filesystem" }
366+
},
367+
{
368+
name: "github",
369+
description: "GitHub repos, issues, pull requests, code search",
370+
transport: { type: "stdio", command: "mcp-server-github" }
371+
},
372+
{
373+
name: "brew",
374+
description: "Install, update, and manage macOS packages via Homebrew",
375+
transport: { type: "stdio", command: "mcp-server-brew" }
376+
}
377+
]
378+
)
379+
380+
# Discovery connects only :brew for this message — filesystem and github are skipped
381+
robot.run("install imagemagick")
382+
```
383+
384+
### How It Works
385+
386+
`MCP::ServerDiscovery.select(query, from:, threshold:)` computes TF cosine similarity between the user's query and each server's topic text (`name + description`). Servers scoring at or above `DEFAULT_THRESHOLD` (0.05) are returned; the rest are excluded.
387+
388+
The threshold is intentionally low — server descriptions are short, so raw cosine scores are naturally small even for on-topic queries.
389+
390+
Discovery only applies on the **first** `run()` call (before `@mcp_initialized`). Once a set of servers is connected they remain connected for the robot's lifetime, preserving tool continuity across a conversation.
391+
392+
### Fallback Behaviour
393+
394+
All servers are returned unchanged when any of the following apply:
395+
396+
| Condition | Reason |
397+
|-----------|--------|
398+
| No server has a `description` field | Nothing to score against |
399+
| `classifier` gem unavailable | Raises `DependencyError`, caught internally |
400+
| Query is blank or nil | Nothing to compare |
401+
| No server scores ≥ threshold | Rather fall back than leave the robot with no tools |
402+
403+
### Using the API Directly
404+
405+
```ruby
406+
servers = [
407+
{ name: "filesystem", description: "Read and write files", transport: { ... } },
408+
{ name: "github", description: "GitHub repos and PRs", transport: { ... } }
409+
]
410+
411+
relevant = RobotLab::MCP::ServerDiscovery.select(
412+
"list open pull requests",
413+
from: servers,
414+
threshold: 0.05 # optional, default
415+
)
416+
# => only the :github entry
417+
```
418+
348419
## Connection Resilience
349420
350421
### Eager Connection

examples/28_mcp_discovery.rb

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
#!/usr/bin/env ruby
2+
# frozen_string_literal: true
3+
4+
# Example 28: MCP Server Discovery
5+
#
6+
# When a robot has many MCP servers configured, connecting to all of them
7+
# upfront is wasteful — some servers may be irrelevant to a particular query.
8+
#
9+
# MCP Server Discovery uses TF cosine similarity to select only the servers
10+
# semantically relevant to the user's query, then connects only those.
11+
#
12+
# == Key config
13+
#
14+
# robot = RobotLab.build(
15+
# mcp_discovery: true, # ← enables semantic filtering
16+
# mcp: [ ... ] # ← candidate servers, each with :description
17+
# )
18+
#
19+
# == Fallback behaviour
20+
#
21+
# All servers are connected unchanged when:
22+
# - No server has a :description field
23+
# - The classifier gem is unavailable
24+
# - The query is blank or nil
25+
# - No server scores at or above the threshold (0.05 by default)
26+
#
27+
# This demo exercises MCP::ServerDiscovery directly — no LLM calls needed.
28+
#
29+
# Usage:
30+
# bundle exec ruby examples/28_mcp_discovery.rb
31+
32+
require_relative "../lib/robot_lab"
33+
34+
# Three representative MCP server configurations
35+
SERVERS = [
36+
{
37+
name: "filesystem",
38+
description: "Read, write, and search local files and directories",
39+
transport: { type: "stdio", command: "mcp-server-filesystem" }
40+
},
41+
{
42+
name: "github",
43+
description: "GitHub repos, issues, pull requests, code search",
44+
transport: { type: "stdio", command: "mcp-server-github" }
45+
},
46+
{
47+
name: "brew",
48+
description: "Install, update, and manage macOS packages via Homebrew",
49+
transport: { type: "stdio", command: "mcp-server-brew" }
50+
}
51+
].freeze
52+
53+
def show_query(label, query)
54+
selected = RobotLab::MCP::ServerDiscovery.select(query, from: SERVERS)
55+
names = selected.map { |s| s[:name] }
56+
57+
puts " Query : #{query.inspect}"
58+
puts " Match : #{names.inspect}"
59+
puts
60+
end
61+
62+
puts "=" * 60
63+
puts "Example 28: MCP Server Discovery"
64+
puts " Semantic server selection via TF cosine similarity"
65+
puts "=" * 60
66+
puts
67+
puts "Candidate servers:"
68+
SERVERS.each do |s|
69+
puts " #{s[:name].ljust(12)} #{s[:description]}"
70+
end
71+
puts
72+
73+
puts "Discovery queries:"
74+
puts "-" * 60
75+
show_query("File ops", "read my config file")
76+
show_query("Package mgmt", "install imagemagick via homebrew")
77+
show_query("Code review", "list open pull requests on my repo")
78+
79+
puts "Fallback cases:"
80+
puts "-" * 60
81+
82+
# No description → all servers returned
83+
no_desc_servers = SERVERS.map { |s| s.except(:description) }
84+
result = RobotLab::MCP::ServerDiscovery.select("install imagemagick", from: no_desc_servers)
85+
puts " No descriptions : returns all (#{result.size} servers)"
86+
87+
# Blank query → all servers returned
88+
result = RobotLab::MCP::ServerDiscovery.select("", from: SERVERS)
89+
puts " Blank query : returns all (#{result.size} servers)"
90+
91+
# Very high threshold → no match → fallback to all
92+
result = RobotLab::MCP::ServerDiscovery.select("install imagemagick", from: SERVERS, threshold: 1.0)
93+
puts " High threshold : returns all (#{result.size} servers) — no match above 1.0"
94+
95+
puts
96+
puts "mcp_discovery: true on a Robot"
97+
puts "-" * 60
98+
puts <<~NOTE
99+
RobotLab.build(
100+
name: "assistant",
101+
mcp_discovery: true,
102+
mcp: [
103+
{ name: "filesystem", description: "Read, write...", transport: { ... } },
104+
{ name: "github", description: "GitHub repos...", transport: { ... } },
105+
{ name: "brew", description: "Install packages...", transport: { ... } }
106+
]
107+
)
108+
109+
# Only the :brew server is connected for this message:
110+
robot.run("install imagemagick")
111+
NOTE
112+
puts "=" * 60

examples/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -247,6 +247,14 @@ bundle exec rake examples:run[27]
247247

248248
**Requires:** LLM API key
249249

250+
### 28 — MCP Server Discovery
251+
252+
When a robot has many MCP servers configured, connecting to all of them upfront is wasteful. `mcp_discovery: true` enables semantic server selection: before the first connection, `MCP::ServerDiscovery` scores each server's `name + description` against the user query using TF cosine similarity and connects only the relevant subset.
253+
254+
Demonstrates: `MCP::ServerDiscovery.select(query, from:, threshold:)`, the `description:` field on MCP server configs, `mcp_discovery: true` on Robot, and all four fallback cases (no descriptions, blank query, classifier unavailable, no match above threshold).
255+
256+
**Requires:** None (no LLM calls — exercises the discovery module directly)
257+
250258
### 18 — Rails Integration Demo
251259

252260
A minimal, hand-built Rails 8 app that exercises every piece of RobotLab's Rails integration end-to-end. No `rails new` — every file is hand-crafted for minimum size.

improvements.md

Lines changed: 4 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -64,32 +64,10 @@ Documented as a router pattern in `examples/23_convergence.rb`: when verifier A
6464

6565
`memory.store_document(key, text)` embeds text via `Fastembed::TextEmbedding` (BGE passage embedding) and stores it. `memory.search_documents(query, limit: 5)` embeds the query and returns top-N by cosine similarity. `RobotLab::DocumentStore` is the standalone backing class. Lazy model init — ONNX model downloaded on first use. No optional dependency: `fastembed` is already a core dep. Demo: `examples/26_document_store.rb`.
6666

67-
### 9. MCP Server Discovery Fallback (Semantic)
67+
### ~~9. MCP Server Discovery Fallback (Semantic)~~ ✅ DONE (Phase 6)
6868
**Source**: AIA (Technique 5)
6969

70-
Build an LSI index from MCP server names + topic descriptions at startup. Use as a fallback when keyword-based server selection finds no match ("install imagemagick" semantically maps to the `brew` server).
71-
72-
- Fallback only — no conflict with existing routing
73-
- Requires the `classifier` gem and a description field per MCP server config
74-
- Most valuable in environments with many MCP servers
75-
76-
### 10. Chat History Search
77-
**Source**: AIA (Technique 3)
78-
79-
Build an LSI index from accumulated conversation turns. Enable semantic search across history for context recall.
80-
81-
- Training-free via `classifier` gem
82-
- Could be a `Memory` extension: `memory.search_history(query, limit: 5)`
83-
- Useful for long-running robot sessions
84-
85-
### 11. Embedding-Based Memory Search
86-
**Source**: Hivemind AI
87-
88-
Extend Memory from key-value into RAG territory. `memory.store_document(key, text)` embeds and stores; `memory.search(query, limit: 5)` does similarity search.
89-
90-
- RobotLab already depends on `fastembed` and `ruby_llm-semantic_cache`
91-
- Backend: in-memory for small datasets, pgvector for production
92-
- Biggest capability extension but also largest implementation
70+
`MCP::ServerDiscovery.select(query, from:, threshold:)` uses TF cosine similarity (`String#word_hash`) to pick only the semantically relevant MCP servers for a given user query. `mcp_discovery: true` on `Robot` enables discovery automatically before the first `ensure_mcp_clients` call. `MCP::Server` gained a `:description` field. Falls back to all servers when: no descriptions, classifier unavailable, blank query, or nothing scores above `DEFAULT_THRESHOLD = 0.05`. Demo: `examples/28_mcp_discovery.rb`.
9371

9472
### ~~12. MCP Client Connection Multiplexing~~ ✅ DONE (Phase 5)
9573
**Source**: WaterDrop
@@ -137,7 +115,8 @@ Phase 3 (Inter-robot patterns) ✅ COMPLETE
137115
Phase 4 (Knowledge & retrieval) ✅ COMPLETE
138116
#10 Chat history search ✅
139117
#11 Embedding memory search ✅
140-
#9 MCP discovery fallback (deferred — needs multi-MCP-server environments)
118+
Phase 6 (MCP ergonomics) ✅ COMPLETE
119+
#9 MCP discovery fallback ✅
141120
142121
Phase 5 (Infrastructure) ✅ COMPLETE
143122
#12 MCP multiplexing ✅

lib/robot_lab/mcp/server.rb

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,17 +33,21 @@ class Server
3333
# @return [Hash] the transport configuration
3434
# @!attribute [r] timeout
3535
# @return [Numeric] request timeout in seconds
36-
attr_reader :name, :transport, :timeout
36+
# @!attribute [r] description
37+
# @return [String] human-readable description used by ServerDiscovery
38+
attr_reader :name, :transport, :timeout, :description
3739

3840
# Creates a new Server configuration.
3941
#
4042
# @param name [String] the server name
4143
# @param transport [Hash] the transport configuration
4244
# @param timeout [Numeric, nil] request timeout in seconds (default: 15)
45+
# @param description [String, nil] human-readable description for server discovery
4346
# @param _extra [Hash] additional keys are silently ignored for forward compatibility
4447
# @raise [ArgumentError] if transport type is invalid or required fields are missing
45-
def initialize(name:, transport:, timeout: nil, **_extra)
48+
def initialize(name:, transport:, timeout: nil, description: nil, **_extra)
4649
@name = name.to_s
50+
@description = description.to_s
4751
@transport = normalize_transport(transport)
4852
@timeout = normalize_timeout(timeout)
4953
validate!
@@ -62,6 +66,7 @@ def transport_type
6266
def to_h
6367
{
6468
name: name,
69+
description: description,
6570
transport: transport,
6671
timeout: timeout
6772
}
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# frozen_string_literal: true
2+
3+
module RobotLab
4+
module MCP
5+
# Selects relevant MCP servers for a given user query using TF cosine
6+
# similarity between the query and each server's topic text
7+
# (name + description).
8+
#
9+
# This is used as a fallback mechanism when a robot has many MCP servers
10+
# configured but only some are relevant to a particular user message.
11+
# Instead of connecting to all servers upfront, the robot can enable
12+
# discovery so only the semantically matching servers are connected.
13+
#
14+
# == Usage
15+
#
16+
# robot = RobotLab.build(
17+
# name: "assistant",
18+
# mcp_discovery: true,
19+
# mcp: [
20+
# {
21+
# name: "filesystem",
22+
# description: "Read, write, and search local files and directories",
23+
# transport: { type: "stdio", command: "mcp-server-fs" }
24+
# },
25+
# {
26+
# name: "github",
27+
# description: "GitHub repos, issues, pull requests, code search",
28+
# transport: { type: "stdio", command: "mcp-server-github" }
29+
# },
30+
# {
31+
# name: "brew",
32+
# description: "Install, update, and manage macOS packages via Homebrew",
33+
# transport: { type: "stdio", command: "mcp-server-brew" }
34+
# }
35+
# ]
36+
# )
37+
#
38+
# # Discovery connects only the :brew server for this query:
39+
# robot.run("install imagemagick")
40+
#
41+
# == Fallback Behaviour
42+
#
43+
# The full server list is returned unchanged when:
44+
# - No server has a +:description+ field
45+
# - The 'classifier' gem is unavailable
46+
# - The query is blank
47+
# - No server scores at or above +threshold+ (minimum relevance)
48+
#
49+
# @api private
50+
module ServerDiscovery
51+
# Minimum cosine similarity score for a server to be considered relevant.
52+
# Low by design — server descriptions are short, so scores are naturally
53+
# low even for on-topic queries.
54+
DEFAULT_THRESHOLD = 0.05
55+
56+
# Select MCP servers relevant to the given query.
57+
#
58+
# @param query [String] user message or intent
59+
# @param from [Array<Hash, MCP::Server>] candidate server configs
60+
# @param threshold [Float] minimum cosine score (default 0.05)
61+
# @return [Array<Hash, MCP::Server>] matching servers, or +from+ as
62+
# fallback when no match is found
63+
def self.select(query, from:, threshold: DEFAULT_THRESHOLD)
64+
return from if from.empty?
65+
return from if query.to_s.strip.empty?
66+
return from unless any_descriptions?(from)
67+
68+
TextAnalysis.require_classifier!
69+
70+
scored = from.map { |server| [server, score(query, server)] }
71+
matches = scored.select { |_, s| s >= threshold }.map(&:first)
72+
73+
matches.empty? ? from : matches
74+
rescue DependencyError
75+
# Classifier gem not available — connect to all servers
76+
from
77+
end
78+
79+
private
80+
81+
# @param servers [Array<Hash, MCP::Server>]
82+
def self.any_descriptions?(servers)
83+
servers.any? { |s| !description_for(s).empty? }
84+
end
85+
86+
# Build the topic string used for similarity scoring: name + description.
87+
#
88+
# @param server [Hash, MCP::Server]
89+
# @return [String]
90+
def self.topic_text(server)
91+
name = server.is_a?(Hash) ? server[:name].to_s : server.name.to_s
92+
"#{name} #{description_for(server)}".strip
93+
end
94+
95+
# @param server [Hash, MCP::Server]
96+
# @return [String] description or empty string
97+
def self.description_for(server)
98+
desc = server.is_a?(Hash) ? server[:description] : server.respond_to?(:description) && server.description
99+
desc.to_s.strip
100+
end
101+
102+
# @param query [String]
103+
# @param server [Hash, MCP::Server]
104+
# @return [Float] cosine similarity in [0.0, 1.0]
105+
def self.score(query, server)
106+
TextAnalysis.tf_cosine_similarity(query, topic_text(server))
107+
end
108+
end
109+
end
110+
end

0 commit comments

Comments
 (0)