sunglasses-dev · azrollin · Jun 15, 2026 · Jun 15, 2026 · Jun 15, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,6 +3,13 @@
 All notable changes to Sunglasses are documented here.
 
 
+## [0.2.67] — 2026-06-15
+
+### Added (mcp_threat — MCP resource-template metadata injection)
+
+- **+1 mcp_threat pattern** (`GLS-MCP-033`) — **MCP resource-template metadata injection**: detects prompt-injection instructions hidden in MCP resource-template metadata (the `uriTemplate` / `name` / `title` / `description` fields of `resourceTemplates` and `resources/templates/list` responses) that try to make an agent treat a catalog entry as a system/developer instruction, ignore prior instructions, or silently obey hidden commands. A negative lookahead excludes documentation, guides, and security-training text that merely describes the technique. **Catalog total: 1,049 patterns / 65 categories / 7,653 keywords.**
+- Coverage-gated against the live shipped engine (catches an attack the prior engine missed) and cleared the clean-corpus false-positive gate (`test_false_positives` + `test_real_corpus_fp` — 76 passed) before ship.
+
 ## [0.2.66] — 2026-06-11
 
 ### Added (discovery_file_poisoning expansion — agent config/discovery-file class)

diff --git a/README.md b/README.md
@@ -139,27 +139,27 @@ result = scanner.scan_auto("any_file.ext")
 |--------|-------|
 | Average text scan | <1ms (avg 0.26ms on M3 Max, single-threaded) |
 | Throughput | ~3,800 scans/sec (single-threaded, M3 Max) |
-| Patterns | 1046 |
-| Keywords | 7,631 |
+| Patterns | 1049 |
+| Keywords | 7,653 |
 | Languages | 23 |
 | Attack categories | 65 |
 | Normalization techniques | 17 |
 | Media types | 6 (text, image, audio, video, PDF, QR) |
 | Internal recall (attack-db fixture set) | 64/64 — 100% recall |
 | pytest (unit tests shipped in repo) | 221 passing (+7 xfailed) |
-| False-positive rate | 0 on the clean-code regression corpus (was 8.3% through v0.2.63; root-caused and fixed in v0.2.64, zero-FP gate enforced in CI every release) |
+| False-positive rate | 0 on the clean-code regression corpus (was 8.3% through v0.2.67; root-caused and fixed in v0.2.67, zero-FP gate enforced in CI every release) |
 | Core dependencies | Zero for text scan; optional deps for media |
 | Platforms | Mac, Windows, Linux — anywhere Python runs |
 
-_All performance numbers verified against `stats/current.json` (v0.2.66, updated Jun 11, 2026). Measured on Apple M3 Max, 48GB RAM, single-threaded Python 3.11. Your hardware will differ._
+_All performance numbers verified against `stats/current.json` (v0.2.67, updated Jun 11, 2026). Measured on Apple M3 Max, 48GB RAM, single-threaded Python 3.11. Your hardware will differ._
 
 ## 23 Languages
 
 English, Spanish, Portuguese, French, German, Italian, Dutch, Russian, Ukrainian, Polish, Czech, Turkish, Azerbaijani, Arabic, Hebrew, Persian, Chinese, Japanese, Korean, Hindi, Bengali, Indonesian, Vietnamese — plus normalization handles romanization, Unicode confusables, and 17 other obfuscation techniques. Community language contributions welcome.
 
-## What Works Today (v0.2.66)
+## What Works Today (v0.2.67)
 
-- ✅ Text scanning: 1046 patterns, 7,631 keywords, 23 languages, 65 attack categories
+- ✅ Text scanning: 1049 patterns, 7,653 keywords, 23 languages, 65 attack categories
 - ✅ Negation handling: "do NOT run rm -rf" correctly downgrades severity
 - ✅ Multi-stage pipeline: normalization (17 techniques) → pattern match → decision
 - ✅ Image scanning: OCR + EXIF metadata + hidden text detection (requires Tesseract)

diff --git a/fp_corpus_data.py b/fp_corpus_data.py
@@ -0,0 +1,131 @@
+"""fp_corpus_data.py — THE single source for Sunglasses' clean-FP corpus.
+
+Pure data + stdlib (json, os). NO pytest, NO engine import — so fp_gate (and thus
+Pattern-Forge inside Jack's pytest-less container) loads the corpus with zero test
+deps. The FP test modules and fp_gate all import FROM here. Extracted Jun 12 2026
+(Opus 4.8) per F5 warroom ruling — decouples the gate from pytest (forge hard-rule #2:
+runs in-container unmodified). Stdlib only, no new deps. 🇺🇸
+"""
+import json
+import os
+
+CLEAN_CORPUS = [
+    ("readme_install",
+     "# MyProject\n\nThis project provides tools for data analysis.\n\n"
+     "## Installation\n\n```\npip install myproject\n```\n\n## Setup\n\n"
+     "Download the config and run the setup wizard in your terminal.", "file"),
+    ("readme_prereq",
+     "## Prerequisites\n\nBefore installation, download Python 3.11. Open a "
+     "terminal and paste the command to run the installer.", "file"),
+    ("webpage_simple",
+     "<html><body><h1>Welcome</h1><p>This is a normal web page about cooking "
+     "recipes.</p></body></html>", "web_content"),
+    ("webpage_blog",
+     "<html><head><title>Blog</title></head><body><article><h1>My trip</h1>"
+     "<p>We visited the model village and saw a robot.</p></article></body>"
+     "</html>", "web_content"),
+    ("tech_discussion",
+     "The system prompt is an important concept in LLM applications. Here's "
+     "how to design one for your AI agent.", "message"),
+    ("security_article",
+     "Prompt injection is a growing concern. Researchers are developing "
+     "detection methods for command injection and SSRF.", "message"),
+    ("security_article2",
+     "This article explains how jailbreak attempts and developer mode "
+     "exploits work, so you can defend against them.", "message"),
+    ("apikey_safe",
+     "You can find your API key in the dashboard under Settings > API Keys. "
+     "Keep your token and secret safe.", "message"),
+    ("apikey_safe_bearer",
+     "To authenticate, pass your API key as a bearer token in the "
+     "Authorization header.", "message"),
+    ("css_file",
+     "body { font-size: 16px; color: #333; background: white; }", "file"),
+    ("json_api",
+     '{"status": "ok", "data": {"temperature": 72, "humidity": 45}, '
+     '"model": "v2", "timestamp": "2026-03-28"}', "api_response"),
+    ("polite_question",
+     "Can you help me write a Python function to sort a list?", "message"),
+    ("code_review_req",
+     "Please review this code and suggest improvements for error handling.",
+     "message"),
+    ("legit_memory",
+     "Remember: the user prefers dark mode and wants responses in bullet "
+     "points.", "log_memory"),
+    ("partial_keyword",
+     "Please review my previous work on the instruction manual.", "message"),
+    ("devops_doc",
+     "Our CI runs the build, then deploys. We use a redirect for the old URL "
+     "and an HTTP health check.", "message"),
+    ("code_subprocess",
+     "import os\ndef run(cmd):\n    return subprocess.run(cmd, check=True)  "
+     "# internal helper, trusted input only", "file"),
+    ("oauth_doc",
+     "OAuth lets a third-party application access your account with read-only "
+     "scope without sharing your password.", "message"),
+    ("git_doc",
+     "To contribute, fork the repo, create a branch, and open a pull request. "
+     "A maintainer will review and merge it.", "message"),
+    ("mcp_doc",
+     "An MCP server exposes tools to an AI assistant. The model calls a tool "
+     "and the server returns a response.", "message"),
+    ("eval_word",
+     "We need to evaluate the candidates and do a performance eval next "
+     "quarter.", "message"),
+    ("exec_word",
+     "The executive team and the exec assistant will join the meeting about "
+     "the new model launch.", "message"),
+    ("pkg_doc",
+     "After installation, import the package and call the main function. See "
+     "setup.py for dependencies.", "file"),
+    ("crawler_doc",
+     "Our web crawler indexes pages for the search bot. The provider_url "
+     "field points to the oembed endpoint.", "file"),
+    # ── Discovery files — NORMAL ones. A scanner that blocks a plain robots.txt
+    #    or security.txt is the exact embarrassment the discovery_file_poisoning
+    #    category warns against. Added Jun 6 2026 (v0.2.62 FP fix).
+    ("clean_robots_txt",
+     "User-agent: *\nDisallow: /admin/\nDisallow: /private/\nAllow: /public/\n"
+     "Crawl-delay: 10\n\nUser-agent: Googlebot\nAllow: /\n\n"
+     "Sitemap: https://example.com/sitemap.xml", "file"),
+    ("clean_llms_txt",
+     "# Example Corp\n\n> Example Corp builds developer tools for API "
+     "monitoring.\n\n## Docs\n- [Getting Started](https://example.com/docs/start): "
+     "How to install and configure.\n- [API Reference](https://example.com/docs/api): "
+     "Full endpoint reference.", "file"),
+    ("clean_security_txt",
+     "Contact: mailto:security@example.com\nExpires: 2026-12-31T23:59:59.000Z\n"
+     "Encryption: https://example.com/pgp-key.txt\nPreferred-Languages: en, es\n"
+     "Canonical: https://example.com/.well-known/security.txt\n"
+     "Policy: https://example.com/security-policy", "file"),
+    ("clean_sitemap_xml",
+     '<?xml version="1.0" encoding="UTF-8"?>\n'
+     '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n'
+     '  <url><loc>https://example.com/</loc><priority>1.0</priority></url>\n'
+     '  <url><loc>https://example.com/about</loc><priority>0.8</priority></url>\n'
+     '</urlset>', "file"),
+    ("clean_ai_plugin_json",
+     '{"schema_version": "v1", "name_for_human": "Example Weather", '
+     '"name_for_model": "weather", "description_for_human": "Get the weather '
+     'forecast.", "description_for_model": "Plugin for getting current weather '
+     'and forecasts by city.", "contact_email": "support@example.com"}', "file"),
+    ("clean_humans_txt",
+     "/* TEAM */\nDeveloper: Jane Doe\nSite: jane@example.com\nLocation: San "
+     "Diego, CA\n\n/* THANKS */\nOpen source community\n\n/* SITE */\n"
+     "Standards: HTML5, CSS3\nComponents: React, Node.js", "file"),
+]
+
+
+_STDLIB_DIR = os.path.dirname(json.__file__)
+
+
+def clean_files():
+    """Known-clean real files a reviewer points the tool at (stdlib json modules).
+    Deliberately small/medium modules — large ones trigger slow regex backtracking
+    (tracked separately). README excluded (contains attack-example strings by design)."""
+    files = []
+    for mod in ("decoder.py", "encoder.py"):
+        cand = os.path.join(_STDLIB_DIR, mod)
+        if os.path.exists(cand):
+            files.append(cand)
+    return files
diff --git a/fp_gate.py b/fp_gate.py
@@ -0,0 +1,71 @@
+#!/usr/bin/env python3
+"""
+fp_gate.py — THE shared false-positive gate for Sunglasses. Jun 12 2026 (Opus 4.8).
+
+ONE gate, callable everywhere: next-ship `promote`, ship preflight, AND Jack's
+Pattern-Forge (in-container). Import it — do NOT reimplement the corpus.
+
+    from fp_gate import fp_offenders
+    offenders = fp_offenders(candidate_ids={"GLS-DFP-097", ...})   # empty set == clean pass
+
+Truth model (mirrors tests/test_false_positives.py + tests/test_real_corpus_fp.py exactly):
+  - Clean corpus = the CURATED CLEAN_CORPUS list + the known-clean stdlib files the
+    real-corpus test uses. NOT a naive scrape of every string in the test module
+    (that would wrongly include attack canaries / should-block fixtures).
+  - An OFFENDER = a finding with BLOCKING severity (critical/high/medium) on clean
+    content. Low/info findings are not false positives. This matches the tests'
+    `blocking = [f for f ... if severity in (...)]` assertion.
+Run from the glasses/ dir (so `sunglasses` and `tests` import).
+"""
+BLOCKING = ("critical", "high", "medium")
+
+
+def clean_samples():
+    """(text, channel) pairs that MUST scan clean — the curated FP corpus only.
+    Sourced from fp_corpus_data (the pytest-free single source) so this gate — and
+    Pattern-Forge built on it — runs inside Jack's container with no test deps.
+    Decoupled from the test modules Jun 12 2026 (Opus 4.8) per F5 warroom ruling."""
+    out = []
+    try:
+        import fp_corpus_data as d
+        for item in d.CLEAN_CORPUS:          # (name, text, channel)
+            out.append((item[1], item[2]))
+        for path in d.clean_files():         # known-clean real files (decoder.py, encoder.py)
+            try:
+                out.append((open(path, errors="ignore").read(), "file"))
+            except Exception:
+                pass
+    except Exception:
+        pass
+    return out
+
+
+def _fid_sev(f):
+    if isinstance(f, dict):
+        return f.get("id"), f.get("severity")
+    return getattr(f, "id", None), getattr(f, "severity", None)
+
+
+def fp_offenders(candidate_ids=None, extra_samples=None):
+    """Set of pattern IDs that raise a BLOCKING finding on clean code. Empty == gate pass.
+    candidate_ids: if given, intersect offenders with these (only care about new patterns).
+    extra_samples: iterable of clean-code strings (scanned as channel='file')."""
+    from sunglasses.engine import SunglassesEngine
+    eng = SunglassesEngine()
+    samples = clean_samples() + [(s, "file") for s in (extra_samples or [])]
+    off = set()
+    for text, channel in samples:
+        res = eng.scan(text, channel=channel)
+        for f in res.findings:
+            fid, sev = _fid_sev(f)
+            if fid and sev in BLOCKING:
+                off.add(fid)
+    if candidate_ids is not None:
+        off &= set(candidate_ids)
+    return off
+
+
+if __name__ == "__main__":
+    import json, sys
+    cands = set(sys.argv[1:]) or None
+    print("FP_OFFENDERS=" + json.dumps(sorted(fp_offenders(candidate_ids=cands))))