Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions .github/workflows/ai-redteam-regression.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: AI Red Team Regression

on:
pull_request:
push:
branches: [ main ]

jobs:
redteam-regression:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install garak

- name: Run baseline security scan (example)
run: |
mkdir -p reports
python -m garak --model_type openai --model_name gpt-4o-mini --report_prefix reports/garak || true

- name: Upload reports
uses: actions/upload-artifact@v4
with:
name: ai-redteam-reports
path: reports/
94 changes: 94 additions & 0 deletions GUIDE_ENHANCEMENT_RECOMMENDATIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# AI Red Teaming Guide: Recommended Enhancements

This document summarizes high-impact additions that would make the guide more actionable, easier to operationalize, and more maintainable over time.

## 1) Add an "Executive Quickstart" (30/60/90-day plan)
- Include role-specific quickstarts for startup, mid-size, and enterprise teams.
- Provide first-week tasks, first red-team exercise template, and basic metrics dashboard.
- Benefit: helps readers move from theory to execution faster.

## 2) Add a full "Threat Modeling Workshop" template
- Add a facilitator script, pre-read checklist, and output artifacts (risk register, prioritized test plan).
- Include sample architecture diagrams and data-flow examples for LLM apps, RAG, and agents.
- Benefit: standardizes planning quality across teams.

## 3) Expand "Agentic AI" section with attack trees and controls mapping
- Add concrete attack trees for tool-use abuse, memory poisoning, and inter-agent privilege escalation.
- Map each attack path to preventive, detective, and corrective controls.
- Benefit: closes the gap between conceptual threats and implementation.

## 4) Add "Evaluation Harness" reference implementation
- Provide a minimal reproducible folder structure for prompt sets, expected policy outcomes, and scoring scripts.
- Include examples for ASR, false positive/negative rates, and regression tracking.
- Benefit: enables repeatable benchmarking and CI integration.

## 5) Add severity + triage model tailored to AI harms
- Extend CVSS-like scoring with AI-specific dimensions (scale, autonomy, recoverability, user impact).
- Provide triage SLAs and remediation ownership guidance.
- Benefit: improves prioritization consistency and executive reporting.

## 6) Add "Defensive Architecture Patterns" section
- Include secure prompt orchestration, policy-as-code checks, tool permissioning, sandboxing, and output guardrails.
- Add reference diagrams showing where to enforce controls in request/response pipelines.
- Benefit: gives builders prescriptive designs, not only attack descriptions.

## 7) Add "Multilingual & Cultural Safety" testing playbook
- Provide test set design guidance for low-resource languages and region-specific harm categories.
- Include translation-loop and mixed-language bypass tests.
- Benefit: strengthens global deployment readiness.

## 8) Add "Data Governance for Red Teaming" guidance
- Define safe handling for prompts, logs, PII, and model outputs during testing.
- Include retention rules, anonymization procedures, and legal review checkpoints.
- Benefit: reduces compliance risk while testing aggressively.

## 9) Add "Metrics that matter" section with anti-metrics
- Keep ASR but add risk-reduction metrics: exploit recurrence, time-to-fix, residual risk trend, control coverage.
- Add anti-metrics to avoid (e.g., raw test-count vanity metrics).
- Benefit: shifts focus from activity to risk reduction.

## 10) Add a "Purple Team Operations" chapter
- Include collaboration workflows between red team, detection engineering, and incident response.
- Provide playbooks for converting red-team findings into detections and runbooks.
- Benefit: better organizational learning and faster hardening.

## 11) Add "Case study quality bar" and normalized template
- Standardize every case study with context, exploit chain, root cause, controls bypassed, cost to remediate, and lessons.
- Add a confidence level and evidence source for each claim.
- Benefit: improves credibility and cross-case comparability.

## 12) Add "Common implementation pitfalls" section
- Examples: over-reliance on keyword blocking, missing tool authorization boundaries, lack of regression suites.
- Include “what good looks like” alternatives.
- Benefit: helps practitioners avoid known traps.

## 13) Add "Secure SDLC integration" artifacts
- Provide PR checklist, release gate criteria, and production monitoring runbook for AI-specific security.
- Include sample GitHub Actions pipelines for red-team regression tests.
- Benefit: embeds red teaming into delivery workflows.

## 14) Add "Model and system cards" for security posture
- Provide templates for documenting attack surface, tested risks, residual risks, and operational guardrails.
- Benefit: improves transparency for internal governance and audits.

## 15) Add source hygiene and update governance
- Introduce a versioned changelog and "last validated" date per external tool/resource.
- Mark claims as "evidence-backed" vs "expert guidance".
- Benefit: keeps a long guide trustworthy as the ecosystem changes quickly.

## 16) Add practitioner appendices
- Red-team rules of engagement template (editable).
- Vulnerability report template.
- Test-case library starter pack.
- Stakeholder readout slide outline.
- Benefit: reduces startup friction and increases consistency.

## Suggested prioritization (highest first)
1. Executive Quickstart
2. Evaluation Harness reference implementation
3. Agentic AI attack trees + controls mapping
4. Severity/triage model for AI harms
5. Secure SDLC integration artifacts
6. Data governance for red teaming
7. Multilingual & cultural safety playbook
8. Purple team operations
150 changes: 150 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,11 @@
- [Real-World Case Studies](#real-world-case-studies)
- [Building Your Red Team](#building-your-red-team)
- [Best Practices](#best-practices)
- [Implementation Quickstart (30/60/90)](#implementation-quickstart-306090)
- [Evaluation Harness (Reference Implementation)](#evaluation-harness-reference-implementation)
- [Agentic AI Attack Trees + Controls Mapping](#agentic-ai-attack-trees--controls-mapping)
- [AI Harm Severity and Triage Model](#ai-harm-severity-and-triage-model)
- [Secure SDLC Integration Artifacts](#secure-sdlc-integration-artifacts)
- [Regulatory Compliance](#regulatory-compliance)
- [Resources and References](#resources-and-references)

Expand Down Expand Up @@ -1738,6 +1743,151 @@ Red team members should feel comfortable:

---


## 🚀 Implementation Quickstart (30/60/90)

Use this phased plan to turn guidance into an operating program.

### First 30 Days (Foundation)
- Define system scope, stakeholders, and crown-jewel assets
- Run a 2-hour threat modeling workshop (use `templates/threat-modeling-workshop.md`)
- Create an initial attack library with at least:
- 25 prompt injection tests
- 25 jailbreak tests
- 10 data leakage tests
- Establish baseline metrics: ASR, critical/high count, time-to-triage

### Days 31-60 (Operationalization)
- Implement weekly automated red-team regression in CI
- Add manual deep-dive sessions for top 3 business-critical scenarios
- Define triage SLA by severity (Critical/High/Medium/Low)
- Stand up a shared red-team findings board with remediation owners

### Days 61-90 (Scale)
- Add multilingual and multi-turn attack suites
- Add agentic AI abuse tests (tool misuse, memory poisoning, permissions)
- Launch monthly purple-team exercise with detection and IR teams
- Publish quarterly security posture report with residual risk trends

---

## 🧪 Evaluation Harness (Reference Implementation)

A lightweight structure for repeatable red-teaming and regression tracking:

```
security-evals/
├── prompts/
│ ├── prompt_injection.csv
│ ├── jailbreaks.csv
│ └── data_leakage.csv
├── policies/
│ └── expected_outcomes.yaml
├── scorers/
│ ├── policy_violation.py
│ └── leakage_detector.py
├── reports/
│ ├── latest.json
│ └── trend.csv
└── run_eval.py
```

### Minimum Scoring Set
- **ASR** by attack category
- **False positives/negatives** for moderation and detection controls
- **Exploit recurrence rate** after mitigation
- **Time-to-fix** and **time-to-verify**

### Release Gates (Suggested)
- Block release if:
- Any **Critical** issue is open
- ASR for high-risk category > 5%
- Regression introduces > 20% ASR increase in any tracked class

---

## 🕸️ Agentic AI Attack Trees + Controls Mapping

Use attack trees to connect offensive testing paths to defensive controls.

### Attack Tree A: Tool Misuse
1. Inject hidden instruction into user-supplied content
2. Agent adopts malicious instruction priority
3. Agent invokes high-privilege tool
4. Agent executes unsafe action

**Controls:**
- Preventive: tool allowlists, scoped API tokens, policy checks pre-execution
- Detective: anomalous tool-call monitoring, high-risk action alerts
- Corrective: transaction rollback, credential rotation, incident playbook

### Attack Tree B: Memory Poisoning
1. Adversary plants false memory artifact
2. Agent persists poisoned state
3. Subsequent sessions trust manipulated context
4. Agent behavior drifts into unsafe decisions

**Controls:**
- Preventive: memory write policies, source trust labels, TTL for memory items
- Detective: memory integrity diffs, unusual memory mutation alerts
- Corrective: memory quarantine/reset, retrospective impact analysis

### Attack Tree C: Inter-Agent Privilege Escalation
1. Compromise low-privilege agent with prompt injection
2. Lateral instruction passing to orchestrator
3. Orchestrator executes action outside original permission boundary
4. Expanded access leads to data exfiltration or sabotage

**Controls:**
- Preventive: identity-bound inter-agent authz, least-privilege role boundaries
- Detective: cross-agent call graph anomaly detection
- Corrective: isolate compromised agent, revoke delegated capabilities

---

## 📈 AI Harm Severity and Triage Model

Use CVSS as a base, then add AI-specific modifiers:

| Dimension | Description | Scale |
|-----------|-------------|-------|
| **Exploitability** | How easy the issue is to reproduce | Low/Med/High |
| **User Impact** | Potential harm to users or protected groups | Low/Med/High/Critical |
| **Autonomy Factor** | Can agents execute actions without human confirmation? | None/Partial/Full |
| **Blast Radius** | Single user, tenant, or cross-tenant/system-wide | Narrow/Broad/Systemic |
| **Recoverability** | Time/effort to safely restore expected behavior | Easy/Moderate/Hard |

### Triage SLA (Suggested)
- **Critical**: acknowledge immediately, mitigate within 24 hours
- **High**: acknowledge within 4 hours, mitigate within 7 days
- **Medium**: mitigate within 30 days
- **Low**: backlog with risk acceptance + review date

---

## 🧩 Secure SDLC Integration Artifacts

To reduce "one-off" testing, integrate red-team controls into delivery workflows.

### PR Security Checklist (AI Systems)
- [ ] Threat model updated for new capabilities/tools
- [ ] New prompts/flows added to evaluation harness
- [ ] High-risk tool actions require explicit authorization checks
- [ ] Logging and privacy controls validated
- [ ] Residual risks documented in system card

### Release Readiness Criteria
- No open Critical findings
- All High findings have approved mitigation or documented exception
- Regression suite passes for required attack categories
- Monitoring/detection rules deployed for new features

### Operational Runbook Triggers
- Sudden ASR spike (>2x baseline)
- New jailbreak family with repeat success
- Evidence of cross-tenant leakage or autonomous unsafe tool use


## 📋 Regulatory Compliance

### United States
Expand Down
10 changes: 10 additions & 0 deletions templates/ai-security-pr-checklist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# AI Security PR Checklist

- [ ] Threat model updated if behavior/capability changed
- [ ] New or modified prompts added to security regression suite
- [ ] Tool authorization boundary validated (least privilege)
- [ ] Prompt injection and jailbreak tests executed for changed flows
- [ ] Data handling reviewed for PII/log retention requirements
- [ ] Output filtering and policy checks validated
- [ ] Monitoring/detection rules updated for new failure modes
- [ ] Residual risks documented in model/system card
34 changes: 34 additions & 0 deletions templates/threat-modeling-workshop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Threat Modeling Workshop Template (AI Systems)

## Workshop Goals
- Identify highest-risk abuse paths for the AI system.
- Prioritize red-team test scenarios by business impact and exploitability.
- Assign owners and due dates for controls and mitigations.

## Participants
- Product owner
- AI/ML engineer
- Security engineer / red team
- Detection / SOC representative
- Legal / compliance (for high-risk domains)

## Pre-Read Checklist
- Architecture diagram and trust boundaries
- Data flow (inputs, retrieval, tools, outputs)
- List of model capabilities and enabled tools/actions
- Existing guardrails (input/output/content/policy)
- Known incidents or prior findings

## 120-Minute Agenda
1. Scope and assumptions (15 min)
2. System walkthrough (20 min)
3. Threat brainstorming (30 min)
4. Risk scoring and prioritization (30 min)
5. Test plan and owners (20 min)
6. Wrap-up and next steps (5 min)

## Output Artifacts
- Prioritized risk register
- Red-team test plan for next sprint
- Detection/monitoring gaps backlog
- Signed-off risk acceptance for deferred items
Loading