Skip to content

feat(redteam): add built-in red teaming support#184

Open
kevmyung wants to merge 8 commits intostrands-agents:mainfrom
kevmyung:feat/red-team-foundation
Open

feat(redteam): add built-in red teaming support#184
kevmyung wants to merge 8 commits intostrands-agents:mainfrom
kevmyung:feat/red-team-foundation

Conversation

@kevmyung
Copy link
Copy Markdown

@kevmyung kevmyung commented Mar 31, 2026

Description

Adds built-in red teaming capabilities to strands-evals, enabling automated adversarial testing of AI agents.

image

Core components:

  • Attack presets (jailbreak, prompt_extraction, harmful_content): Pre-built actor profiles, goals, seed inputs, and per-preset evaluation metrics
  • Strategy system: Pluggable attack strategies separated from presets. Ships with gradual_escalation — an adaptive multi-turn strategy that analyzes target responses and pivots techniques dynamically
  • RedTeamJudgeEvaluator: Composite safety evaluator with 3 metrics (guardrail_breach, harmfulness, prompt_leakage). Dynamically builds judge prompts based on only the metrics relevant to each attack pattern
  • run_red_team() entry point: End-to-end orchestration — case generation, multi-turn attack simulation via ActorSimulator, and safety evaluation in a single call
  • Target-aware goal generation: Optional target_info parameter for LLM-generated attack goals tailored to the specific target system

Related Issues

Closes #177

Type of Change

New feature

Testing

  • I ran hatch run prepare
  • Unit tests for presets, runner, and judge evaluator (49 tests passing)
  • Integration tested against mock compliant target and Claude Haiku target

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…pport

- AttackStrategy ABC, RiskCategory, AttackGoal shared types
- red_team() entry point with Agent auto-extraction and tool trace capture
- AttackSuccessEvaluator with continuous 0.0-1.0 scoring
- Strategy cross-product expansion and custom case injection
- RedTeamReport with grouped views
@kevmyung kevmyung force-pushed the feat/red-team-foundation branch from 8d7d3f5 to c9f5845 Compare April 25, 2026 04:18
Copy link
Copy Markdown
Contributor

@poshinchen poshinchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use built-in python | / list instead of typing's deprecated Union, List and so on?

@kevmyung kevmyung requested a deployment to manual-approval May 1, 2026 15:23 — with GitHub Actions Waiting
@kevmyung
Copy link
Copy Markdown
Author

kevmyung commented May 1, 2026

Could you use built-in python | / list instead of typing's deprecated Union, List and so on?

Quick heads-up – fixed it in 438f9e0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Built-in red teaming support

2 participants