feat(mcp): self-healing tool execution with LLM argument correction by Mathews-Tom · Pull Request #40 · Mathews-Tom/VaultMind

Mathews-Tom · 2026-03-25T12:51:57Z

Summary

Add self-healing tool execution to the MCP server, inspired by SivaRamSV/paaw's self-healing tool pattern. When an MCP tool fails with a retryable error, the system asks the LLM to diagnose the failure and suggest corrected arguments, then retries once. Security and auth errors are never retried.

How It Works

Tool call → _dispatch_tool()
  ├─ Success → return result
  └─ Failure
       ├─ Non-retryable (ProfileError, PathTraversalError, TypeError) → raise immediately
       └─ Retryable (ValueError, LLMError, TimeoutError, ConnectionError)
            → LLM diagnoses error + suggests corrected args (JSON)
            → _dispatch_tool() with corrected args
                 ├─ Success → return result (logged as retry success)
                 └─ Failure → raise ToolRetryExhaustedError (both errors preserved)

Error Classification

Category	Error Types	Behavior
Non-retryable	`ProfileError`, `PathTraversalError`, `TypeError`, `KeyError`, `AttributeError`	Raise immediately, no retry
Retryable	`ValueError`, `LLMError`, `TimeoutError`, `ConnectionError`	LLM correction + 1 retry
Unknown	Any error not in either list	Raise immediately (safe default)

LLM Correction

When a retryable error occurs, the executor sends a structured prompt to the fast model:

Includes tool name, original arguments, error message, and error type
Lists common fixes (query length, path format, missing fields)
Expects JSON response: {"arguments": {...corrected...}} or {"corrected": false, "reason": "..."}
Invalid JSON, LLM errors, or declined corrections → original error raised (no retry)

Changes

New Files

src/vaultmind/mcp/retry.py (168 lines) — ToolRetryExecutor class with execute(), _should_retry(), and _correct_args() methods. ToolRetryExhaustedError exception preserving both original_error and retry_error. Module-level _NON_RETRYABLE_TYPES frozenset for hard-blocked error types
tests/test_mcp_retry.py (313 lines) — 19 tests across 4 classes

Modified Files

src/vaultmind/config.py — Added MCPRetryConfig class with enabled, max_retries, use_llm_correction, correction_model, timeout_seconds, retryable_errors fields. Added mcp_retry field to Settings
config/default.toml — Added [mcp_retry] section with all config entries
src/vaultmind/mcp/server.py — Added retry_executor optional parameter to create_mcp_server(). When provided, tool dispatch routes through ToolRetryExecutor.execute() with a closure over all dependencies. When absent, dispatch is direct — preserving existing behavior

Backward Compatibility

retry_executor defaults to None in create_mcp_server() — existing callers unaffected
When MCPRetryConfig.enabled is False, executor is a passthrough (direct dispatch)
When use_llm_correction is False or no LLM client provided, retryable errors still raise (no silent swallowing)
ToolRetryExhaustedError inherits from Exception, preserving standard error handling chains

Test plan

19 new tests in test_mcp_retry.py across 4 classes:
- Basic execution (3): success passthrough, disabled config, no LLM on success
- Retry logic (6): retryable triggers retry, non-retryable errors (ProfileError, PathTraversalError, TypeError) raise immediately, exhausted error with preserved errors
- LLM correction (6): valid JSON, invalid JSON, declined, LLM error, disabled, missing client
- Should retry (4): ValueError/TimeoutError retryable, ProfileError non-retryable, unknown not retried
Full suite: 867/867 tests pass, 0 regressions
ruff check — clean
mypy --ignore-missing-imports — clean
Integration: verify retry triggers on real MCP tool call with transient failure
Manual: confirm LLM correction produces valid argument corrections for common error types

New module mcp/retry.py with ToolRetryExecutor that wraps tool dispatch with a single retry on retryable errors (ValueError, LLMError, TimeoutError, ConnectionError). On failure, asks the LLM to diagnose and suggest corrected arguments via structured JSON. Security and auth errors (ProfileError, PathTraversalError) are never retried. Add MCPRetryConfig with enabled, max_retries, use_llm_correction, correction_model, timeout_seconds, and retryable_errors fields.

Accept optional retry_executor parameter in create_mcp_server(). When provided, tool dispatch routes through ToolRetryExecutor.execute() with a closure over all dependencies. When absent, dispatch is direct — preserving existing behavior for callers that don't configure retry.

19 tests across 4 classes: basic execution (3), retry logic (6), LLM correction (6), and should_retry classification (4). Covers success passthrough, retryable vs non-retryable error routing, LLM JSON parsing (valid/invalid/declined), exhausted error preservation, and disabled/missing client edge cases.

Mathews-Tom added 3 commits March 25, 2026 17:28

Mathews-Tom merged commit f4152b6 into main Mar 25, 2026
3 checks passed

Mathews-Tom deleted the feat/mcp-retry branch March 25, 2026 12:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): self-healing tool execution with LLM argument correction#40

feat(mcp): self-healing tool execution with LLM argument correction#40
Mathews-Tom merged 3 commits into
mainfrom
feat/mcp-retry

Mathews-Tom commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mathews-Tom commented Mar 25, 2026

Summary

How It Works

Error Classification

LLM Correction

Changes

New Files

Modified Files

Backward Compatibility

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant