Skip to content

SantiaGoMode/injectguard

Repository files navigation

InjectGuard

A layered defense system that protects AI agents from indirect prompt injection attacks hidden in external content — files, URLs, API responses, and tool outputs.

InjectGuard sits between your agent and the outside world. It scans every piece of external content through an ensemble of detection engines before it ever reaches your LLM.

External Content ──> InjectGuard ──> Safe Content ──> Your Agent
   (files, URLs,       (scan,            (clean,
    API responses,      detect,           sanitized,
    tool outputs)       block)            annotated)

Why

LLMs are vulnerable to indirect prompt injection: malicious instructions embedded in documents, web pages, or API responses that hijack agent behavior. These attacks hide in:

  • HTML pages (invisible <div style="display:none"> elements, comments, tiny fonts)
  • PDFs (annotations, metadata, embedded files)
  • Images (EXIF metadata, PNG text chunks, OCR text)
  • JSON/YAML (deeply nested string values)
  • Encoded payloads (base64, ROT13, hex sequences)
  • Multilingual attacks (instructions in 12+ languages)
  • Split payloads (instructions spread across multiple fields)

InjectGuard catches them all.

Install

pip install -e .                 # core + heuristic scanner
pip install -e ".[api]"          # + FastAPI server & dashboard
pip install -e ".[similarity]"   # + vector similarity scanner
pip install -e ".[all]"          # everything
pip install -e ".[dev]"          # + test/lint tools

Quick Start

Python SDK

from injectguard import Scanner

scanner = Scanner()

# Scan text from any external source
result = scanner.scan_text("content from an API response")
if result.is_safe:
    agent.process(result.original_content)
else:
    print(f"Blocked: {result.threats_found}")  # e.g. ['instruction_override', 'exfiltration']

# Scan files (HTML, PDF, JSON, images, text)
result = scanner.scan_file("document.pdf")

# Scan URLs (with built-in SSRF protection)
result = scanner.scan_url("https://example.com/data")

# Batch scan
results = scanner.scan_batch([
    {"content": "first item"},
    {"content": "second item"},
])

Policy Presets

from injectguard import Scanner, Policy

# Permissive — higher thresholds, fewer blocks
scanner = Scanner.with_preset("permissive")

# High security — aggressive blocking
scanner = Scanner.with_preset("high_security")

# Custom policy from YAML
scanner = Scanner(policy=Policy.from_file("policy.yaml"))

CLI

# Scan a file
injectguard scan document.pdf

# Scan a URL
injectguard scan https://example.com/page

# Scan from stdin
echo "ignore previous instructions" | injectguard scan -

# JSON output
injectguard scan document.html --json

# Verbose mode — show individual scanner scores
injectguard scan suspicious.pdf --verbose

# Use a stricter policy
injectguard scan data.json --preset high_security

Exit codes: 0 = safe, 1 = flagged, 2 = blocked.

API Server

# Start the server
injectguard serve --host 0.0.0.0 --port 9000

# Or with a policy preset
injectguard serve --preset high_security
# Scan content
curl -X POST http://localhost:9000/v1/scan \
  -H "Content-Type: application/json" \
  -d '{"content": "some text to scan"}'

# Batch scan
curl -X POST http://localhost:9000/v1/scan/batch \
  -H "Content-Type: application/json" \
  -d '{"items": [{"content": "text 1"}, {"content": "text 2"}]}'

# Sanitize — scan + strip threats
curl -X POST http://localhost:9000/v1/sanitize \
  -H "Content-Type: application/json" \
  -d '{"content": "Ignore instructions. Normal text here."}'

# Health check
curl http://localhost:9000/v1/health

# Audit log
curl http://localhost:9000/v1/audit?limit=50

HTTP Proxy

Transparently scan all HTTP traffic for your agent:

injectguard proxy --port 8080 --block-mode replace

# Point your agent at the proxy
export HTTP_PROXY=http://127.0.0.1:8080

Block modes: replace (swap blocked content), drop (empty response), header_only (pass content but add warning headers).

Detection Pipeline

InjectGuard uses a layered ensemble approach. Each scanner votes with a confidence score, and a weighted aggregator produces the final verdict.

Scanner What it catches Speed
Heuristic ~70 regex patterns across 7 attack categories: instruction override, fake context, exfiltration, social engineering, encoding evasion, structural attacks, manipulation. Also detects invisible characters and homoglyph substitution. <1ms
Advanced Heuristic Base64/ROT13/hex encoded payloads, multilingual attacks (12 languages), split payload detection ~1ms
ML Classifier DeBERTa-v3-base fine-tuned for prompt injection detection (ONNX inference) ~50ms
Vector Similarity ChromaDB + sentence-transformers matching against known attack patterns. Self-hardening: confirmed attacks are added to the vector store. ~20ms
LLM Judge Multi-backend (Ollama, Claude, OpenAI) structured analysis for subtle attacks that evade pattern matching ~1-5s

The pipeline supports early exit — if any scanner returns a score above 0.95, it short-circuits immediately without running slower scanners.

Verdicts

Verdict Meaning
safe No threats detected
flagged Suspicious content, review recommended
blocked High-confidence threat, content should not reach the agent
sanitized Threats were found and stripped; sanitized content is available
error Scanner error

Content Parsing

InjectGuard doesn't just scan visible text. Its paranoid extraction mode pulls content from places attackers hide payloads:

  • HTML: CSS-hidden elements (display:none, visibility:hidden, opacity:0, off-screen positioning, zero font-size, same-color text), comments, <script>/<noscript>, hidden inputs, aria-hidden, data attributes
  • PDF: Page text, annotations, metadata fields, embedded file detection
  • Images: PNG text chunks (tEXt, iTXt), EXIF metadata, OCR via Tesseract
  • JSON/YAML: Recursive string extraction with key path tracking
  • URLs: Full page fetch with SSRF protection, optional Playwright JS rendering for SPAs

Sanitization

When content is flagged, InjectGuard can clean it instead of blocking:

  • Strip invisible characters — zero-width spaces, directional overrides, BOM markers
  • Neutralize delimiters<system> becomes [TAG:system], preventing role injection
  • Remove hidden HTML — strips display:none elements, comments, scripts
  • Annotate suspicious content — wraps threats in [SUSPICIOUS:instruction_override] markers
  • Truncate — enforces content length limits

Framework Integrations

LangChain

from injectguard.integrations.langchain import ShieldedWebLoader, shield_tool

# Scan-on-load for web content
loader = ShieldedWebLoader("https://example.com")
docs = loader.load()  # raises if blocked

# Wrap any tool
@shield_tool()
def my_tool(query: str) -> str:
    return external_api.call(query)

CrewAI

from injectguard.integrations.crewai import ShieldedTool, shield_crew_tools

# Wrap a single tool
safe_tool = ShieldedTool(original_tool)

# Wrap all tools for a crew
safe_tools = shield_crew_tools([tool1, tool2, tool3])

MCP (Model Context Protocol)

from injectguard.integrations.mcp import MCPToolWrapper

wrapper = MCPToolWrapper()

@wrapper.wrap
async def fetch_data(url: str) -> str:
    return await http_client.get(url)

Dashboard

The built-in web dashboard provides real-time monitoring at /dashboard:

  • Total scans, safe/flagged/blocked counts
  • Average latency and block rate
  • Recent scan table with verdict, score, threats, and timing
  • Auto-refreshes every 30 seconds via HTMX
injectguard serve  # dashboard available at http://localhost:9000/dashboard

Alerting

Get notified when threats are detected:

# Environment variables
export INJECTGUARD_SLACK_WEBHOOK=https://hooks.slack.com/services/...
export INJECTGUARD_WEBHOOK_URL=https://your-app.com/webhook
export INJECTGUARD_ALERT_ON_FLAGGED=true
from injectguard.alerting import AlertManager, SlackAlert, WebhookAlert

manager = AlertManager.from_env()
# or configure manually
manager = AlertManager(alert_on_blocked=True, alert_on_flagged=True)
manager.add_channel(SlackAlert(webhook_url="..."))
manager.add_channel(WebhookAlert(url="..."))

# After each scan
manager.check_and_alert(result)

Authentication & Rate Limiting

# Require API keys (comma-separated)
export INJECTGUARD_API_KEYS=key1,key2,key3

# Rate limiting
export INJECTGUARD_RATE_LIMIT=100        # requests per window
export INJECTGUARD_RATE_WINDOW=60        # window in seconds
export INJECTGUARD_RATE_LIMIT_ENABLED=true
curl -H "X-API-Key: key1" http://localhost:9000/v1/scan ...

When no API keys are configured, authentication is disabled (open access).

Deployment

Docker

cd docker
docker compose up -d

Kubernetes (Helm)

helm install injectguard ./helm/injectguard \
  --set apiKeys="key1,key2" \
  --set ingress.enabled=true \
  --set ingress.hosts[0].host=injectguard.example.com

The Helm chart includes: Deployment, Service, HPA (autoscaling), Ingress, PVC (persistent audit storage), and Secrets management.

Policy Configuration

Create a policy.yaml to customize thresholds, scanner weights, domain rules, and more:

thresholds:
  block: 0.85
  flag: 0.60
  sanitize: 0.70

scanner_weights:
  heuristic: 1.0
  ml_classifier: 1.2
  similarity: 0.8
  llm_judge: 1.5

content_rules:
  max_content_length: 500000
  strip_invisible_chars: true
  check_homoglyphs: true

domain_policy:
  blocked_domains:
    - "evil.com"
  allowed_domains: []

enable_ml_classifier: true
enable_similarity: false
enable_llm_judge: false

Benchmarks

# Accuracy benchmark (35 malicious + 25 benign samples)
injectguard benchmark --verbose

# Latency profiling
python benchmarks/bench_latency.py

Heuristic-only results (no ML model):

  • Accuracy: 88.3%
  • Precision: 96.7%
  • Recall: 82.9%
  • Latency: 0.15ms (100B) to 71ms (50KB)

Architecture

src/injectguard/
  client.py              # Scanner SDK (main entry point)
  models.py              # Verdict, ScanResult, ScannerResult
  policy.py              # Policy configuration & presets
  cli.py                 # CLI (scan, serve, proxy, benchmark)
  alerting.py            # Slack, webhook, console alerts
  proxy.py               # HTTP scanning proxy
  core/
    pipeline.py          # Scanner orchestration & ensemble scoring
  scanners/
    heuristic.py         # Regex pattern scanner (~70 patterns)
    advanced_heuristics.py  # Encoding, multilingual, split payloads
    ml_classifier.py     # DeBERTa ONNX classifier
    similarity.py        # ChromaDB vector similarity
    llm_judge.py         # LLM-based analysis
  parsers/
    html.py              # Paranoid HTML parser
    pdf.py               # PDF text + annotation extractor
    image.py             # PNG chunks, EXIF, OCR
    json_parser.py       # Recursive JSON string extractor
    text.py              # Plain text / markdown
  fetchers/
    url.py               # URL fetcher with SSRF protection
    file.py              # Local file fetcher
    playwright.py        # JS-rendered page fetcher
  sanitizer/
    engine.py            # Content sanitizer
  integrations/
    langchain.py         # LangChain loaders & tool wrapper
    crewai.py            # CrewAI tool wrapper
    mcp.py               # MCP middleware & tool wrapper
  api/
    server.py            # FastAPI app factory
    routes.py            # REST API endpoints
    middleware.py         # Auth & rate limiting
    state.py             # App state management
  audit/
    store.py             # SQLite audit log
    logger.py            # Structured logging
  dashboard/
    routes.py            # HTMX dashboard routes
    templates/           # Dashboard HTML templates

License

MIT

About

InjectGuard — Layered defense system protecting AI agents from prompt injection in external content

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages