From fb3b997c20c088305df7f84fb8c96768c4c01bb0 Mon Sep 17 00:00:00 2001 From: Eve McGivern Date: Sat, 13 Jun 2026 11:28:38 -0500 Subject: [PATCH] docs(site): refresh llms.txt + landing for 0.2.7 (eval numbers, multi-turn) The site lagged the package: stale eval results (v0.4.0 / 86 samples / 79.7-89.8%), softwareVersion 0.2.5, and no mention of scanSession. - Eval block (llms.txt + index.html): v0.6.0 patterns, 105 samples (67 adversarial / 38 benign), 82.1 / 91.0 / 91.0% detection, 0% FP. - Add multi-turn / session (scanSession split-payload) as a shipped capability in both files; note semantic accumulation deferred to ML. - softwareVersion 0.2.5 -> 0.2.7 (schema.org metadata). - Fix llms.txt quick-start to the real API (result.clean / result.sanitized; scanRAGChunksSync filter on r.clean) instead of a non-existent armor.sanitize() call. Co-Authored-By: Claude Opus 4.8 --- site/index.html | 17 +++++++++++------ site/llms.txt | 18 +++++++++++------- 2 files changed, 22 insertions(+), 13 deletions(-) diff --git a/site/index.html b/site/index.html index 2390c4d..7980b45 100644 --- a/site/index.html +++ b/site/index.html @@ -37,7 +37,7 @@ "name": "Stylus Nexus Holdings, LLC", "url": "https://stylusnexus.com" }, - "softwareVersion": "0.2.5", + "softwareVersion": "0.2.7", "citation": { "@type": "ScholarlyArticle", "name": "On the Viability of AI Agent Traps", @@ -369,6 +369,11 @@

Cognitive State

Semantic Manipulation

Biased framing, oversight evasion, persona hyperstition

+
+ Shipped +

Multi-turn / Session

+

Cross-turn split-payload detection via scanSession — instructions chopped across conversation turns

+
Shipped

ML Classifier

@@ -406,18 +411,18 @@

Quick start

-

Eval results (v0.4.0 patterns)

+

Eval results (v0.6.0 patterns)

- - - + + +
StrictnessDetection RateFalse Positive Rate
Permissive79.7%0.0%
Balanced89.8%0.0%
Strict89.8%0.0%
Permissive82.1%0.0%
Balanced91.0%0.0%
Strict91.0%0.0%
-

86 curated samples (59 adversarial, 27 benign) from WASP, HackAPrompt, Greshake et al., and 2025-2026 real-world incidents.

+

105 curated samples (67 adversarial, 38 benign) from WASP, HackAPrompt, Greshake et al., and 2025-2026 real-world incidents.

Includes 10 samples from real-world attacks (MCP poisoning, RAG saturation, supply chain injection) that regex does not yet catch, measuring the gap the ML classifier closes. On the original 49 adversarial samples, regex detection is 100% at balanced.

diff --git a/site/llms.txt b/site/llms.txt index cc4485b..ddd17f7 100644 --- a/site/llms.txt +++ b/site/llms.txt @@ -40,22 +40,26 @@ import { AgentArmor } from '@stylusnexus/agentarmor'; const armor = new AgentArmor(); const result = armor.scanSync(userInput); -if (result.threats.length > 0) { - const safe = armor.sanitize(userInput, result); +if (!result.clean) { + const safe = result.sanitized; } // Filter RAG chunks const clean = armor.scanRAGChunksSync(chunks) - .filter(r => r.threats.length === 0); + .filter(r => r.clean); ``` -## Eval results (v0.4.0 patterns, 86 samples) +## Multi-turn / session scanning + +`scanSession(turns)` scans a conversation (an array of {role, content} turns). Beyond scanning each turn on its own, it catches cross-turn split payloads: a single instruction chopped across a turn boundary (e.g. "ignore all previous" + "instructions...") that no per-turn scan would see. Cross-turn threats name their contributing turns. Cross-turn semantic accumulation (gradual memory poisoning) is deferred to the ML classifier because regex cannot separate it from legitimate scripting without false positives. + +## Eval results (v0.6.0 patterns, 105 samples) Strictness controls the confidence threshold for reporting threats: -- Permissive (threshold 0.7): 79.7% detection, 0.0% false positives — only high-confidence threats, fewer alerts -- Balanced (threshold 0.5): 89.8% detection, 0.0% false positives — recommended default -- Strict (threshold 0.3): 89.8% detection, 0.0% false positives — maximum coverage, catches subtle attacks +- Permissive (threshold 0.7): 82.1% detection, 0.0% false positives — only high-confidence threats, fewer alerts +- Balanced (threshold 0.5): 91.0% detection, 0.0% false positives — recommended default +- Strict (threshold 0.3): 91.0% detection, 0.0% false positives — maximum coverage, catches subtle attacks Includes 10 adversarial samples from 2025-2026 real-world incidents (MCP tool poisoning, RAG saturation, covert exfil, supply chain injection) that regex patterns do not yet catch. These measure the gap the ML classifier closes. On the original 49 adversarial samples, regex detection is 100% at balanced.