ToolTrust Scanner currently ships 18 active built-in rules: 16 tool-definition rules and 2 source-scan rules for embedded MCP implementations. Each rule fires independently; a tool or repository can trigger multiple rules.
Severity: Critical
Detects malicious instructions hidden in tool names or descriptions that attempt to hijack the agent's reasoning, override system prompts, or redirect behavior toward attacker-controlled goals.
Common patterns: ignore previous instructions, system:, role-override language, and base64-encoded payloads intended to override the agent's instructions.
Severity: High / Low (depends on permission type)
Flags tools that declare broad capabilities — filesystem access, network access, database access, or arbitrary code execution — without a clear, scoped justification.
Permissions checked: exec, network, fs, db.
Severity: High
Fires when a tool's name implies one capability but its description or schema claims another. Example: a tool called read_file that also declares network write access.
Severity: High / Critical
Queries the OSV vulnerability database for known CVEs in packages declared as dependencies. Requires network access during scan; results are cached per run.
Severity: High
Detects tools that request or claim admin, root, sudo, or elevated permission scopes beyond what the tool's stated purpose requires.
Severity: Critical
Flags tools whose name or description implies the ability to run arbitrary host commands, scripts, or code. Patterns include exec, eval, shell, run_code, execute_script, backtick shell syntax.
Severity: Info
Tools with no description or no input schema give the agent no basis for safe use. Flagged as informational — not a security risk by itself, but a quality signal.
Severity: Critical
Checks an offline bundled blacklist of packages confirmed to have been compromised in supply chain attacks. No network required — zero latency.
Current blacklist: LiteLLM 1.82.7/1.82.8 (TeamPCP .pth backdoor), Trivy v0.69.4–v0.69.6 (CI pipeline compromise), Langflow < 1.9.0 (unauthenticated RCE), Axios 1.14.1/0.30.4 (malicious npm publish), Bitwarden CLI 2026.4.0, compromised @cap-js/* releases, and Mini Shai-Hulud/TanStack-related compromised npm/PyPI package versions.
Severity: Medium
Uses edit-distance heuristics to detect tool names that closely resemble known legitimate tools — a common technique for impersonation attacks. Tuned to avoid false positives on legitimate plural/variant tool families.
Severity: Medium
Flags tools whose input parameters appear designed to accept secrets (API keys, tokens, passwords) in plaintext rather than via environment variables or secret stores.
Severity: Low
Tools that perform network or execution operations without declaring rate-limit, timeout, or retry configuration can cause runaway agent behavior or denial-of-service conditions.
Severity: High / Medium
Detects tools whose names are exact normalized duplicates of other tools in the same server — a sign that a malicious tool is attempting to shadow or override a legitimate one.
Near-duplicate detection (edit distance 1) was removed in v0.1.15 after a 13/13 false-positive rate on legitimate tool families. Only exact normalized duplicates fire this rule.
Severity: Info
Flags MCP tools that do not expose metadata.dependencies and do not provide a repo_url, which limits ToolTrust's ability to perform meaningful supply-chain analysis.
Severity: Medium / High
Flags npm dependency versions that publish install-time lifecycle scripts such as preinstall, install, postinstall, or prepare. Severity rises when the script contains remote-fetch or inline-execution patterns.
Severity: Critical
Flags npm dependency versions whose published registry metadata or install-time scripts reference known malicious IOC package names, domains, URLs, or script patterns, such as plain-crypto-js, @tanstack/setup, Mini Shai-Hulud infrastructure, or reviewed shell-fetch indicators. This is narrower than full tarball signature scanning, but it can still catch compromised releases when an IOC appears in dependency metadata.
Severity: Medium
Flags tool descriptions that explicitly suggest forwarding user data, content, or conversation history to external endpoints such as remote hosts, external servers, attacker-controlled URLs, or base64-encoded sinks. This is intentionally separate from AS-001 so prompt-injection findings stay focused on instruction override language.
Severity: Info
Flags repositories where MCP SDK imports and server initialization are present in source code, but the scanner could not enumerate tool definitions from a manifest or live handshake.
This is a presence signal, not a clean bill of health. It tells you the repo likely exposes MCP functionality, but more review is needed to understand tools, auth, and scope.
Severity: High / Critical
Flags embedded MCP HTTP servers where one route reaches the MCP handler without the authentication middleware applied on another route serving the same handler.
The initial implementation focuses on Go/Gin-style route registrations and raises severity when it sees strong exploitability signals such as:
- an unauthenticated alternate MCP route like
/mcp_message - fail-open IP allowlist logic
- the same handler used by both authenticated and unauthenticated MCP endpoints