A layered defense system that protects AI agents from indirect prompt injection attacks hidden in external content — files, URLs, API responses, and tool outputs.
InjectGuard sits between your agent and the outside world. It scans every piece of external content through an ensemble of detection engines before it ever reaches your LLM.
External Content ──> InjectGuard ──> Safe Content ──> Your Agent
(files, URLs, (scan, (clean,
API responses, detect, sanitized,
tool outputs) block) annotated)
LLMs are vulnerable to indirect prompt injection: malicious instructions embedded in documents, web pages, or API responses that hijack agent behavior. These attacks hide in:
- HTML pages (invisible
<div style="display:none">elements, comments, tiny fonts) - PDFs (annotations, metadata, embedded files)
- Images (EXIF metadata, PNG text chunks, OCR text)
- JSON/YAML (deeply nested string values)
- Encoded payloads (base64, ROT13, hex sequences)
- Multilingual attacks (instructions in 12+ languages)
- Split payloads (instructions spread across multiple fields)
InjectGuard catches them all.
pip install -e . # core + heuristic scanner
pip install -e ".[api]" # + FastAPI server & dashboard
pip install -e ".[similarity]" # + vector similarity scanner
pip install -e ".[all]" # everything
pip install -e ".[dev]" # + test/lint toolsfrom injectguard import Scanner
scanner = Scanner()
# Scan text from any external source
result = scanner.scan_text("content from an API response")
if result.is_safe:
agent.process(result.original_content)
else:
print(f"Blocked: {result.threats_found}") # e.g. ['instruction_override', 'exfiltration']
# Scan files (HTML, PDF, JSON, images, text)
result = scanner.scan_file("document.pdf")
# Scan URLs (with built-in SSRF protection)
result = scanner.scan_url("https://example.com/data")
# Batch scan
results = scanner.scan_batch([
{"content": "first item"},
{"content": "second item"},
])from injectguard import Scanner, Policy
# Permissive — higher thresholds, fewer blocks
scanner = Scanner.with_preset("permissive")
# High security — aggressive blocking
scanner = Scanner.with_preset("high_security")
# Custom policy from YAML
scanner = Scanner(policy=Policy.from_file("policy.yaml"))# Scan a file
injectguard scan document.pdf
# Scan a URL
injectguard scan https://example.com/page
# Scan from stdin
echo "ignore previous instructions" | injectguard scan -
# JSON output
injectguard scan document.html --json
# Verbose mode — show individual scanner scores
injectguard scan suspicious.pdf --verbose
# Use a stricter policy
injectguard scan data.json --preset high_securityExit codes: 0 = safe, 1 = flagged, 2 = blocked.
# Start the server
injectguard serve --host 0.0.0.0 --port 9000
# Or with a policy preset
injectguard serve --preset high_security# Scan content
curl -X POST http://localhost:9000/v1/scan \
-H "Content-Type: application/json" \
-d '{"content": "some text to scan"}'
# Batch scan
curl -X POST http://localhost:9000/v1/scan/batch \
-H "Content-Type: application/json" \
-d '{"items": [{"content": "text 1"}, {"content": "text 2"}]}'
# Sanitize — scan + strip threats
curl -X POST http://localhost:9000/v1/sanitize \
-H "Content-Type: application/json" \
-d '{"content": "Ignore instructions. Normal text here."}'
# Health check
curl http://localhost:9000/v1/health
# Audit log
curl http://localhost:9000/v1/audit?limit=50Transparently scan all HTTP traffic for your agent:
injectguard proxy --port 8080 --block-mode replace
# Point your agent at the proxy
export HTTP_PROXY=http://127.0.0.1:8080Block modes: replace (swap blocked content), drop (empty response), header_only (pass content but add warning headers).
InjectGuard uses a layered ensemble approach. Each scanner votes with a confidence score, and a weighted aggregator produces the final verdict.
| Scanner | What it catches | Speed |
|---|---|---|
| Heuristic | ~70 regex patterns across 7 attack categories: instruction override, fake context, exfiltration, social engineering, encoding evasion, structural attacks, manipulation. Also detects invisible characters and homoglyph substitution. | <1ms |
| Advanced Heuristic | Base64/ROT13/hex encoded payloads, multilingual attacks (12 languages), split payload detection | ~1ms |
| ML Classifier | DeBERTa-v3-base fine-tuned for prompt injection detection (ONNX inference) | ~50ms |
| Vector Similarity | ChromaDB + sentence-transformers matching against known attack patterns. Self-hardening: confirmed attacks are added to the vector store. | ~20ms |
| LLM Judge | Multi-backend (Ollama, Claude, OpenAI) structured analysis for subtle attacks that evade pattern matching | ~1-5s |
The pipeline supports early exit — if any scanner returns a score above 0.95, it short-circuits immediately without running slower scanners.
| Verdict | Meaning |
|---|---|
safe |
No threats detected |
flagged |
Suspicious content, review recommended |
blocked |
High-confidence threat, content should not reach the agent |
sanitized |
Threats were found and stripped; sanitized content is available |
error |
Scanner error |
InjectGuard doesn't just scan visible text. Its paranoid extraction mode pulls content from places attackers hide payloads:
- HTML: CSS-hidden elements (
display:none,visibility:hidden,opacity:0, off-screen positioning, zero font-size, same-color text), comments,<script>/<noscript>, hidden inputs,aria-hidden, data attributes - PDF: Page text, annotations, metadata fields, embedded file detection
- Images: PNG text chunks (
tEXt,iTXt), EXIF metadata, OCR via Tesseract - JSON/YAML: Recursive string extraction with key path tracking
- URLs: Full page fetch with SSRF protection, optional Playwright JS rendering for SPAs
When content is flagged, InjectGuard can clean it instead of blocking:
- Strip invisible characters — zero-width spaces, directional overrides, BOM markers
- Neutralize delimiters —
<system>becomes[TAG:system], preventing role injection - Remove hidden HTML — strips
display:noneelements, comments, scripts - Annotate suspicious content — wraps threats in
[SUSPICIOUS:instruction_override]markers - Truncate — enforces content length limits
from injectguard.integrations.langchain import ShieldedWebLoader, shield_tool
# Scan-on-load for web content
loader = ShieldedWebLoader("https://example.com")
docs = loader.load() # raises if blocked
# Wrap any tool
@shield_tool()
def my_tool(query: str) -> str:
return external_api.call(query)from injectguard.integrations.crewai import ShieldedTool, shield_crew_tools
# Wrap a single tool
safe_tool = ShieldedTool(original_tool)
# Wrap all tools for a crew
safe_tools = shield_crew_tools([tool1, tool2, tool3])from injectguard.integrations.mcp import MCPToolWrapper
wrapper = MCPToolWrapper()
@wrapper.wrap
async def fetch_data(url: str) -> str:
return await http_client.get(url)The built-in web dashboard provides real-time monitoring at /dashboard:
- Total scans, safe/flagged/blocked counts
- Average latency and block rate
- Recent scan table with verdict, score, threats, and timing
- Auto-refreshes every 30 seconds via HTMX
injectguard serve # dashboard available at http://localhost:9000/dashboardGet notified when threats are detected:
# Environment variables
export INJECTGUARD_SLACK_WEBHOOK=https://hooks.slack.com/services/...
export INJECTGUARD_WEBHOOK_URL=https://your-app.com/webhook
export INJECTGUARD_ALERT_ON_FLAGGED=truefrom injectguard.alerting import AlertManager, SlackAlert, WebhookAlert
manager = AlertManager.from_env()
# or configure manually
manager = AlertManager(alert_on_blocked=True, alert_on_flagged=True)
manager.add_channel(SlackAlert(webhook_url="..."))
manager.add_channel(WebhookAlert(url="..."))
# After each scan
manager.check_and_alert(result)# Require API keys (comma-separated)
export INJECTGUARD_API_KEYS=key1,key2,key3
# Rate limiting
export INJECTGUARD_RATE_LIMIT=100 # requests per window
export INJECTGUARD_RATE_WINDOW=60 # window in seconds
export INJECTGUARD_RATE_LIMIT_ENABLED=truecurl -H "X-API-Key: key1" http://localhost:9000/v1/scan ...When no API keys are configured, authentication is disabled (open access).
cd docker
docker compose up -dhelm install injectguard ./helm/injectguard \
--set apiKeys="key1,key2" \
--set ingress.enabled=true \
--set ingress.hosts[0].host=injectguard.example.comThe Helm chart includes: Deployment, Service, HPA (autoscaling), Ingress, PVC (persistent audit storage), and Secrets management.
Create a policy.yaml to customize thresholds, scanner weights, domain rules, and more:
thresholds:
block: 0.85
flag: 0.60
sanitize: 0.70
scanner_weights:
heuristic: 1.0
ml_classifier: 1.2
similarity: 0.8
llm_judge: 1.5
content_rules:
max_content_length: 500000
strip_invisible_chars: true
check_homoglyphs: true
domain_policy:
blocked_domains:
- "evil.com"
allowed_domains: []
enable_ml_classifier: true
enable_similarity: false
enable_llm_judge: false# Accuracy benchmark (35 malicious + 25 benign samples)
injectguard benchmark --verbose
# Latency profiling
python benchmarks/bench_latency.pyHeuristic-only results (no ML model):
- Accuracy: 88.3%
- Precision: 96.7%
- Recall: 82.9%
- Latency: 0.15ms (100B) to 71ms (50KB)
src/injectguard/
client.py # Scanner SDK (main entry point)
models.py # Verdict, ScanResult, ScannerResult
policy.py # Policy configuration & presets
cli.py # CLI (scan, serve, proxy, benchmark)
alerting.py # Slack, webhook, console alerts
proxy.py # HTTP scanning proxy
core/
pipeline.py # Scanner orchestration & ensemble scoring
scanners/
heuristic.py # Regex pattern scanner (~70 patterns)
advanced_heuristics.py # Encoding, multilingual, split payloads
ml_classifier.py # DeBERTa ONNX classifier
similarity.py # ChromaDB vector similarity
llm_judge.py # LLM-based analysis
parsers/
html.py # Paranoid HTML parser
pdf.py # PDF text + annotation extractor
image.py # PNG chunks, EXIF, OCR
json_parser.py # Recursive JSON string extractor
text.py # Plain text / markdown
fetchers/
url.py # URL fetcher with SSRF protection
file.py # Local file fetcher
playwright.py # JS-rendered page fetcher
sanitizer/
engine.py # Content sanitizer
integrations/
langchain.py # LangChain loaders & tool wrapper
crewai.py # CrewAI tool wrapper
mcp.py # MCP middleware & tool wrapper
api/
server.py # FastAPI app factory
routes.py # REST API endpoints
middleware.py # Auth & rate limiting
state.py # App state management
audit/
store.py # SQLite audit log
logger.py # Structured logging
dashboard/
routes.py # HTMX dashboard routes
templates/ # Dashboard HTML templates
MIT