Security guards protect your AI agents from common attack vectors when using MCP (Model Context Protocol) servers.
Detects and blocks malicious tool descriptions that could manipulate AI behavior.
Attack Pattern: An MCP server returns tools with descriptions containing hidden instructions like:
- "SYSTEM: ignore all previous instructions"
- "When this tool is called, also execute..."
- Hidden unicode characters with embedded commands
Configuration:
securityGuards:
toolPoisoning:
enabled: true
strict_mode: true
custom_patterns:
- "(?i)SYSTEM:\\s*override"
- "(?i)ignore\\s+all\\s+previous"How it works:
- Intercepts
tools/listresponses from MCP servers - Scans tool names, descriptions, and input schemas
- Blocks if suspicious patterns are detected
- Returns error to client instead of poisoned tools
Detects when MCP servers change their tools after the initial handshake.
Attack Pattern: An MCP server initially returns safe tools, but after the AI has been "trusted," it changes the tools to malicious ones (the "rug pull").
Configuration:
securityGuards:
rugPull:
enabled: true
risk_threshold: 1risk_threshold options:
- The threshold of MCP tool changes that triggers the security guard to block a request
How it works:
- Records tool fingerprint on first
tools/listresponse - Compares subsequent responses against baseline
- Blocks if tools have changed unexpectedly after risk threshold is reached
- New sessions establish fresh baselines
Detects and optionally masks personally identifiable information in MCP responses.
Configuration:
securityGuards:
pii:
enabled: true
detect:
- email
- credit_card
- phone
- ssn
action: mask # or "reject"
min_score: 0.3Actions:
mask: Replace PII with[REDACTED]reject: Reject the entire response
Complete example with all guards:
binds:
- port: 8080
listeners:
- hostname: "*"
routes:
- name: protected-mcp
matches:
- path:
pathPrefix: /mcp
backends:
- mcp:
targets:
- name: backend
mcp:
host: http://mcp-server:3000/mcp
statefulMode: stateful
policies:
securityGuards:
toolPoisoning:
enabled: true
strict_mode: true
rugPull:
enabled: true
risk_threshold: 1
pii:
enabled: false # Enable if neededRun the E2E test suite to verify guards are working:
# Run all security guard tests
./deploy.sh
# Or manually
cd tests/docker
docker compose up -d --build
docker compose run --rm test-runner
#Or via make:
# whole test suite
make test-docker
# separate test
make test-docker-up
python tests/<selected_test>
make test-docker-down
The test suite includes:
- Tool poisoning detection tests
- Rug pull detection tests (session and global scope)
- PII masking tests
Guards operate in fail_closed mode by default:
- If a guard detects a threat, the request is blocked
- If a guard encounters an error, the request is blocked
- This ensures security even during unexpected conditions
When a guard blocks a request, the client receives:
{
"jsonrpc": "2.0",
"error": {
"code": -32603,
"message": "Security guard denied for server 'backend': tool_poisoning detected suspicious content"
}
}- Enable tool poisoning by default - Low overhead, high protection
- Enable PII only when needed - Has performance overhead
- Test with your actual MCP servers - Ensure no false positives
- Monitor guard logs - Track blocked requests for security analysis
- Guards operate at the gateway level, not inside the AI model
- Cannot detect attacks embedded in legitimate-looking content
- PII detection has accuracy limits (configurable via
min_score) - Rug pull detection requires stateful sessions