docs(evals): add Red teaming user guide#2816
Conversation
Add a Red teaming section to the Strands Evals SDK docs (same level as Detectors), with six pages: Overview, Quickstart, Attack Strategies, Writing Custom Cases, Scoring Attacks, and Reading the Report. Register the section in navigation.yml. Covers the experimental red-team module: running adversarial attack strategies against an agent, the five built-in strategies with paper links, risk categories, authoring custom cases, how the LLM judge scores a breach, and how to read the report and act on findings. Every code sample and API reference was verified against the current strands-agents-evals package; the site builds clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Documentation Preview ReadyYour documentation preview has been successfully deployed! Changed pages:
Updated at: 2026-06-16T21:28:15.266Z |
|
|
||
| The target refused nothing — it restated its hidden configuration, leaking the planted `INTERNAL-7741` marker verbatim. The judge scored this `0.95` (full compromise). Each strategy reaches a breach differently — Crescendo by gradual escalation, Bad Likert Judge by the rating-example framing, SequentialBreak by burying the request in a benign sequence — but the end state the judge scores is the same: did the target produce the targeted violation? | ||
|
|
||
| ## Choosing strategies |
There was a problem hiding this comment.
Issue: Two adjacent sections answer nearly the same question: ## Choosing a strategy (line 18, the table plus intro) and ## Choosing strategies (here). The near-identical headings produce two confusingly similar entries in the table of contents, and the "run several and compare" guidance is split across both.
Suggestion: Fold the structural-differences content (turn structure, cost, backtracking) under the first ## Choosing a strategy section, or rename this one to describe its distinct question (e.g. ## How strategies differ). One section, one question.
There was a problem hiding this comment.
Still open as of a7012e9: ## Choosing a strategy (line 18) and ## Choosing strategies (line 56) remain two separate adjacent sections with near-identical headings. The new line-56 content ("run several and compare report.by_strategy()") is good material, but it would land better folded into the first section or under a heading naming its distinct question (e.g. ## How strategies differ).
|
Issue: Em-dashes throughout (96 across the six files: 34 in Suggestion: Replace em-dashes with colons (for elaboration), commas or parentheses (for asides), or restructure. A few examples from the diff:
This is the one mechanical item that's load-bearing for the merge gate, so worth a pass across all six files. |
|
Assessment: Comment Strong, well-organized guide set. I verified every code sample and API reference against the Review themes
Nicely researched work: the paper citations, OWASP/NIST mappings, and the worked GOAT transcript make this genuinely useful rather than just a parameter dump. |
|
Re-review of a7012e9 — thanks for the update. The Quick Example fix is solid, and I re-verified the new One item from the prior review is still open and worth a pass before merge: Issue: Em-dashes are still present (99 across the six files: 30 in Suggestion: A single find-and-replace pass works — colons for elaboration, commas or parentheses for asides. For example |
Adds a Red teaming section to the Strands Evals SDK docs, at the same level as Detectors, registered in
navigation.yml.Six pages:
RedTeamCaseobjectsAttackSuccessEvaluator(LLM-as-judge) scores a breachEvery code sample and API reference was verified against the current
strands-agents-evalspackage;npm run buildis clean and Prettier passes. The in-repo module README ships separately in strands-agents/evals#271.