Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Changelog

All notable changes to this guide should be documented in this file.

## [Unreleased]
### Added
- Operational implementation sections in README:
- Implementation Quickstart (30/60/90)
- Evaluation Harness (Reference Implementation)
- Agentic AI Attack Trees + Controls Mapping
- AI Harm Severity and Triage Model
- Secure SDLC Integration Artifacts
- Defensive Architecture Patterns
- Multilingual & Cultural Safety Playbook
- Data Governance for Red Teaming
- Metrics That Matter (and Anti-Metrics)
- Purple Team Operations
- Common Implementation Pitfalls
- Case Study Quality Bar
- Model & System Cards for Security Posture
- Source Hygiene & Update Governance
- Practitioner Appendices
- New templates in `templates/`:
- threat-modeling-workshop.md
- ai-security-pr-checklist.md
- rules-of-engagement-template.md
- vulnerability-report-template.md
- test-case-library-starter.md
- stakeholder-readout-outline.md
- model-system-security-card.md
- case-study-template.md
- `resources-validation.md` to track external source freshness.
- `.github/workflows/ai-redteam-regression.yml` baseline CI workflow.
175 changes: 175 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1887,6 +1887,181 @@ To reduce "one-off" testing, integrate red-team controls into delivery workflows
- New jailbreak family with repeat success
- Evidence of cross-tenant leakage or autonomous unsafe tool use

## 🛡️ Defensive Architecture Patterns

Translate red-team findings into architecture decisions using a layered control model:

### Reference Pipeline
```
User Input
-> Input normalization/sanitization
-> Policy-as-code pre-checks
-> Prompt orchestration with role boundaries
-> Retrieval/tool authorization gates
-> Model inference
-> Output policy and leakage filters
-> Human-in-the-loop (for high-risk actions)
-> Logging, telemetry, and audit trail
```

### Core Patterns
1. **Secure Prompt Orchestration**
- Separate system, developer, and user instructions
- Prevent untrusted content from altering control prompts

2. **Tool Permissioning and Isolation**
- Grant least-privilege tokens per tool and per action
- Use approval workflows for sensitive actions (payments, credential resets)

3. **Policy-as-Code Enforcement**
- Implement deterministic checks before tool execution
- Version policies and test them in CI alongside prompts

4. **Output Guardrails**
- Add layered filters (policy, PII, compliance)
- Require citations for high-stakes domains where applicable

---

## 🌍 Multilingual & Cultural Safety Playbook

### Test Set Design
- Cover top business languages + low-resource languages in your user base
- Include region-specific harmful-content categories and local legal constraints
- Add culturally sensitive edge cases (slang, euphemisms, coded hate terms)

### Required Test Patterns
- **Translation-loop bypass**: blocked request translated across 2+ languages
- **Mixed-language prompt injection**: instructions split across languages/scripts
- **Code-switching attacks**: alternating dialect/locale variants per turn
- **Contextual harm variance**: same request across regions with different norms

### Reporting Requirements
- Record language, locale, and script for every failure
- Track ASR by language family to identify uneven safety coverage
- Prioritize mitigation where user impact and language penetration are highest

---

## 🗂️ Data Governance for Red Teaming

### Data Classes in Scope
- Prompts and conversational logs
- Retrieved documents and memory artifacts
- Model outputs (including blocked/flagged outputs)
- Metadata containing user identifiers or tenant references

### Handling Rules (Baseline)
- Minimize data collection to testing necessity
- Pseudonymize/anonymize PII before long-term storage
- Encrypt findings repositories and restrict access by role
- Define retention windows per data class (e.g., 30/90/365 days)
- Run legal/compliance review for regulated environments

### Governance Checkpoints
- Pre-engagement data handling approval
- Mid-engagement privacy compliance review
- Post-engagement purge and evidence retention sign-off

---

## 📊 Metrics That Matter (and Anti-Metrics)

### Outcome Metrics (Use)
- **ASR by risk category** (not only aggregate ASR)
- **Exploit recurrence rate** after fixes
- **Median time-to-fix** by severity
- **Residual risk trend** by quarter
- **Control coverage** across high-risk abuse paths

### Anti-Metrics (Avoid)
- Raw number of tests executed without risk weighting
- Total vulnerabilities found as a standalone success metric
- Single-point benchmark scores without trend context
- “Pass rate” without confidence interval/sample-size disclosure

---

## 🟣 Purple Team Operations

### Operating Cadence
1. Red team identifies exploit chain and reproduction steps
2. Detection engineering maps telemetry and creates detections
3. Incident response drafts/updates response runbook
4. Product and platform teams ship mitigations
5. Purple-team replay validates detection + containment effectiveness

### Required Outputs
- Detection rule specifications linked to finding IDs
- Incident runbooks for top critical/high abuse paths
- Post-exercise retro: what failed, what improved, what's next

---

## ⚠️ Common Implementation Pitfalls

| Pitfall | Why It Fails | What Good Looks Like |
|--------|---------------|----------------------|
| Keyword-only blocking | Easy to bypass via encoding/obfuscation | Semantic + policy layered controls |
| Over-trusting agent tools | Enables privilege escalation | Strong authz checks per tool action |
| One-time red team exercise | Misses drift and regressions | Recurring automated + manual cadence |
| Tracking only aggregate ASR | Hides high-risk hotspots | Risk-tiered metrics and trends |
| No regression suite | Reintroduces old vulnerabilities | Versioned attack library in CI |

---

## 🧾 Case Study Quality Bar

Use a normalized template for all future case studies:
- System context and business criticality
- Attack chain with reproducible steps
- Root cause and control failure points
- Severity and estimated remediation effort
- Evidence quality tag (**Evidence-backed** or **Expert guidance**)
- Confidence level (High/Medium/Low)
- Lessons learned and prevention actions

Template available: `templates/case-study-template.md`

---

## 🪪 Model & System Cards for Security Posture

Document security posture using a structured card for every production AI system:
- Intended use and prohibited use
- Attack surface summary
- Tested risk categories and latest validation date
- Open risks and compensating controls
- Incident escalation owners and contacts

Template available: `templates/model-system-security-card.md`

---

## 🔄 Source Hygiene & Update Governance

### Governance Practices
- Maintain a versioned changelog for the guide (`CHANGELOG.md`)
- Track external references with "last validated" timestamps
- Mark major claims as **Evidence-backed** or **Expert guidance**
- Run a quarterly review for stale links/tools/framework updates

Reference index available: `resources-validation.md`

---

## 📎 Practitioner Appendices

Starter artifacts in `templates/`:
- `threat-modeling-workshop.md`
- `ai-security-pr-checklist.md`
- `rules-of-engagement-template.md`
- `vulnerability-report-template.md`
- `test-case-library-starter.md`
- `stakeholder-readout-outline.md`
- `model-system-security-card.md`
- `case-study-template.md`


## 📋 Regulatory Compliance

Expand Down
15 changes: 15 additions & 0 deletions resources-validation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# External Resources Validation Index

Track major external references to keep this guide current.

| Resource | Type | Last Validated | Evidence Tag | Notes |
|----------|------|----------------|--------------|-------|
| NIST AI RMF | Framework | 2026-02-19 | Evidence-backed | Core governance reference |
| OWASP GenAI Guide | Framework | 2026-02-19 | Evidence-backed | Practical LLM testing guidance |
| MITRE ATLAS | Framework | 2026-02-19 | Evidence-backed | Tactics and techniques mapping |
| CSA Agentic AI Guide | Framework | 2026-02-19 | Evidence-backed | Agentic-specific threat coverage |

## Update Process
1. Validate links and publication status quarterly.
2. Update `Last Validated` dates.
3. Mark major additions as Evidence-backed or Expert guidance.
34 changes: 34 additions & 0 deletions templates/case-study-template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# AI Red Team Case Study Template

## Context
- System description:
- Business criticality:
- Deployment context:

## Attack Chain
1.
2.
3.

## Finding Details
- Vulnerability class:
- Root cause:
- Controls bypassed:
- Severity:

## Impact and Cost
- User/business impact:
- Estimated remediation effort:

## Evidence and Confidence
- Evidence quality: Evidence-backed / Expert guidance
- Confidence level: High / Medium / Low
- Sources:

## Remediation and Validation
- Immediate mitigation:
- Long-term control improvements:
- Regression test coverage added:

## Lessons Learned
- What changed in process/architecture:
37 changes: 37 additions & 0 deletions templates/model-system-security-card.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Model/System Security Card

## System Identity
- Name:
- Owner:
- Environment:
- Last updated:

## Intended and Prohibited Use
- Intended use:
- Prohibited use:

## Architecture and Attack Surface
- Interfaces (API/UI/agents/tools):
- Trust boundaries:
- High-value assets:

## Controls
- Preventive controls:
- Detective controls:
- Corrective controls:

## Test Coverage
- Categories tested:
- Latest validation date:
- Known coverage gaps:

## Open Risks
- Risk description:
- Severity:
- Compensating controls:
- Target remediation date:

## Incident Readiness
- On-call owner:
- Escalation path:
- Runbook link:
30 changes: 30 additions & 0 deletions templates/rules-of-engagement-template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Red Team Rules of Engagement (Template)

## Scope
- In scope:
- Out of scope:

## Authorized Techniques
- Allowed:
- Prohibited:

## Safety Guardrails
- No production data export
- Rate limits and resource ceilings
- Mandatory stop conditions

## Escalation and Notification
- Critical finding notification SLA:
- Security contact:
- Legal/compliance contact:

## Data Handling
- Data classes used:
- Retention period:
- Deletion and evidence procedures:

## Sign-off
- Red Team Lead:
- System Owner:
- Security Lead:
- Legal/Compliance:
32 changes: 32 additions & 0 deletions templates/stakeholder-readout-outline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Stakeholder Readout Outline (AI Red Teaming)

## 1. Executive Summary
- Top risks discovered
- Current risk posture trend
- Decision requests for leadership

## 2. Engagement Scope
- Systems and versions tested
- Timeframe and constraints
- Threat assumptions

## 3. Key Findings
- Critical/high findings summary
- Notable exploit chains
- Residual risk after mitigation

## 4. Metrics Dashboard
- ASR by category
- Recurrence rate
- Time-to-fix trend
- Control coverage

## 5. Action Plan
- Immediate (0-30 days)
- Near-term (31-90 days)
- Strategic (90+ days)

## 6. Appendix
- Methodology
- Evidence and confidence levels
- Open questions
26 changes: 26 additions & 0 deletions templates/test-case-library-starter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Test Case Library Starter Pack

## Naming Convention
`<category>-<technique>-<id>`

## Required Metadata Per Test
- Test ID
- Category (prompt injection/jailbreak/data leakage/etc.)
- Risk tier (critical/high/medium/low)
- Target component (model, retrieval, tool, orchestrator)
- Language/locale
- Expected policy outcome
- Last validated date

## Starter Categories
1. Prompt injection (direct/indirect)
2. Jailbreak (single-turn/multi-turn)
3. Data leakage (PII/training-data exposure)
4. Tool misuse (agentic)
5. Memory poisoning (agentic)
6. Cross-tenant isolation checks

## Regression Policy
- Critical/high tests run on every PR
- Full suite run on release branches
- Failed tests require linked mitigation issue
Loading
Loading