diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..cbe8322 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,33 @@ +# Changelog + +All notable changes to this guide should be documented in this file. + +## [Unreleased] +### Added +- Operational implementation sections in README: + - Implementation Quickstart (30/60/90) + - Evaluation Harness (Reference Implementation) + - Agentic AI Attack Trees + Controls Mapping + - AI Harm Severity and Triage Model + - Secure SDLC Integration Artifacts + - Defensive Architecture Patterns + - Multilingual & Cultural Safety Playbook + - Data Governance for Red Teaming + - Metrics That Matter (and Anti-Metrics) + - Purple Team Operations + - Common Implementation Pitfalls + - Case Study Quality Bar + - Model & System Cards for Security Posture + - Source Hygiene & Update Governance + - Practitioner Appendices +- New templates in `templates/`: + - threat-modeling-workshop.md + - ai-security-pr-checklist.md + - rules-of-engagement-template.md + - vulnerability-report-template.md + - test-case-library-starter.md + - stakeholder-readout-outline.md + - model-system-security-card.md + - case-study-template.md +- `resources-validation.md` to track external source freshness. +- `.github/workflows/ai-redteam-regression.yml` baseline CI workflow. diff --git a/README.md b/README.md index bf5631f..5256edc 100644 --- a/README.md +++ b/README.md @@ -1887,6 +1887,181 @@ To reduce "one-off" testing, integrate red-team controls into delivery workflows - New jailbreak family with repeat success - Evidence of cross-tenant leakage or autonomous unsafe tool use +## πŸ›‘οΈ Defensive Architecture Patterns + +Translate red-team findings into architecture decisions using a layered control model: + +### Reference Pipeline +``` +User Input + -> Input normalization/sanitization + -> Policy-as-code pre-checks + -> Prompt orchestration with role boundaries + -> Retrieval/tool authorization gates + -> Model inference + -> Output policy and leakage filters + -> Human-in-the-loop (for high-risk actions) + -> Logging, telemetry, and audit trail +``` + +### Core Patterns +1. **Secure Prompt Orchestration** + - Separate system, developer, and user instructions + - Prevent untrusted content from altering control prompts + +2. **Tool Permissioning and Isolation** + - Grant least-privilege tokens per tool and per action + - Use approval workflows for sensitive actions (payments, credential resets) + +3. **Policy-as-Code Enforcement** + - Implement deterministic checks before tool execution + - Version policies and test them in CI alongside prompts + +4. **Output Guardrails** + - Add layered filters (policy, PII, compliance) + - Require citations for high-stakes domains where applicable + +--- + +## 🌍 Multilingual & Cultural Safety Playbook + +### Test Set Design +- Cover top business languages + low-resource languages in your user base +- Include region-specific harmful-content categories and local legal constraints +- Add culturally sensitive edge cases (slang, euphemisms, coded hate terms) + +### Required Test Patterns +- **Translation-loop bypass**: blocked request translated across 2+ languages +- **Mixed-language prompt injection**: instructions split across languages/scripts +- **Code-switching attacks**: alternating dialect/locale variants per turn +- **Contextual harm variance**: same request across regions with different norms + +### Reporting Requirements +- Record language, locale, and script for every failure +- Track ASR by language family to identify uneven safety coverage +- Prioritize mitigation where user impact and language penetration are highest + +--- + +## πŸ—‚οΈ Data Governance for Red Teaming + +### Data Classes in Scope +- Prompts and conversational logs +- Retrieved documents and memory artifacts +- Model outputs (including blocked/flagged outputs) +- Metadata containing user identifiers or tenant references + +### Handling Rules (Baseline) +- Minimize data collection to testing necessity +- Pseudonymize/anonymize PII before long-term storage +- Encrypt findings repositories and restrict access by role +- Define retention windows per data class (e.g., 30/90/365 days) +- Run legal/compliance review for regulated environments + +### Governance Checkpoints +- Pre-engagement data handling approval +- Mid-engagement privacy compliance review +- Post-engagement purge and evidence retention sign-off + +--- + +## πŸ“Š Metrics That Matter (and Anti-Metrics) + +### Outcome Metrics (Use) +- **ASR by risk category** (not only aggregate ASR) +- **Exploit recurrence rate** after fixes +- **Median time-to-fix** by severity +- **Residual risk trend** by quarter +- **Control coverage** across high-risk abuse paths + +### Anti-Metrics (Avoid) +- Raw number of tests executed without risk weighting +- Total vulnerabilities found as a standalone success metric +- Single-point benchmark scores without trend context +- β€œPass rate” without confidence interval/sample-size disclosure + +--- + +## 🟣 Purple Team Operations + +### Operating Cadence +1. Red team identifies exploit chain and reproduction steps +2. Detection engineering maps telemetry and creates detections +3. Incident response drafts/updates response runbook +4. Product and platform teams ship mitigations +5. Purple-team replay validates detection + containment effectiveness + +### Required Outputs +- Detection rule specifications linked to finding IDs +- Incident runbooks for top critical/high abuse paths +- Post-exercise retro: what failed, what improved, what's next + +--- + +## ⚠️ Common Implementation Pitfalls + +| Pitfall | Why It Fails | What Good Looks Like | +|--------|---------------|----------------------| +| Keyword-only blocking | Easy to bypass via encoding/obfuscation | Semantic + policy layered controls | +| Over-trusting agent tools | Enables privilege escalation | Strong authz checks per tool action | +| One-time red team exercise | Misses drift and regressions | Recurring automated + manual cadence | +| Tracking only aggregate ASR | Hides high-risk hotspots | Risk-tiered metrics and trends | +| No regression suite | Reintroduces old vulnerabilities | Versioned attack library in CI | + +--- + +## 🧾 Case Study Quality Bar + +Use a normalized template for all future case studies: +- System context and business criticality +- Attack chain with reproducible steps +- Root cause and control failure points +- Severity and estimated remediation effort +- Evidence quality tag (**Evidence-backed** or **Expert guidance**) +- Confidence level (High/Medium/Low) +- Lessons learned and prevention actions + +Template available: `templates/case-study-template.md` + +--- + +## πŸͺͺ Model & System Cards for Security Posture + +Document security posture using a structured card for every production AI system: +- Intended use and prohibited use +- Attack surface summary +- Tested risk categories and latest validation date +- Open risks and compensating controls +- Incident escalation owners and contacts + +Template available: `templates/model-system-security-card.md` + +--- + +## πŸ”„ Source Hygiene & Update Governance + +### Governance Practices +- Maintain a versioned changelog for the guide (`CHANGELOG.md`) +- Track external references with "last validated" timestamps +- Mark major claims as **Evidence-backed** or **Expert guidance** +- Run a quarterly review for stale links/tools/framework updates + +Reference index available: `resources-validation.md` + +--- + +## πŸ“Ž Practitioner Appendices + +Starter artifacts in `templates/`: +- `threat-modeling-workshop.md` +- `ai-security-pr-checklist.md` +- `rules-of-engagement-template.md` +- `vulnerability-report-template.md` +- `test-case-library-starter.md` +- `stakeholder-readout-outline.md` +- `model-system-security-card.md` +- `case-study-template.md` + ## πŸ“‹ Regulatory Compliance diff --git a/resources-validation.md b/resources-validation.md new file mode 100644 index 0000000..a2fd632 --- /dev/null +++ b/resources-validation.md @@ -0,0 +1,15 @@ +# External Resources Validation Index + +Track major external references to keep this guide current. + +| Resource | Type | Last Validated | Evidence Tag | Notes | +|----------|------|----------------|--------------|-------| +| NIST AI RMF | Framework | 2026-02-19 | Evidence-backed | Core governance reference | +| OWASP GenAI Guide | Framework | 2026-02-19 | Evidence-backed | Practical LLM testing guidance | +| MITRE ATLAS | Framework | 2026-02-19 | Evidence-backed | Tactics and techniques mapping | +| CSA Agentic AI Guide | Framework | 2026-02-19 | Evidence-backed | Agentic-specific threat coverage | + +## Update Process +1. Validate links and publication status quarterly. +2. Update `Last Validated` dates. +3. Mark major additions as Evidence-backed or Expert guidance. diff --git a/templates/case-study-template.md b/templates/case-study-template.md new file mode 100644 index 0000000..a315ed1 --- /dev/null +++ b/templates/case-study-template.md @@ -0,0 +1,34 @@ +# AI Red Team Case Study Template + +## Context +- System description: +- Business criticality: +- Deployment context: + +## Attack Chain +1. +2. +3. + +## Finding Details +- Vulnerability class: +- Root cause: +- Controls bypassed: +- Severity: + +## Impact and Cost +- User/business impact: +- Estimated remediation effort: + +## Evidence and Confidence +- Evidence quality: Evidence-backed / Expert guidance +- Confidence level: High / Medium / Low +- Sources: + +## Remediation and Validation +- Immediate mitigation: +- Long-term control improvements: +- Regression test coverage added: + +## Lessons Learned +- What changed in process/architecture: diff --git a/templates/model-system-security-card.md b/templates/model-system-security-card.md new file mode 100644 index 0000000..a4ce753 --- /dev/null +++ b/templates/model-system-security-card.md @@ -0,0 +1,37 @@ +# Model/System Security Card + +## System Identity +- Name: +- Owner: +- Environment: +- Last updated: + +## Intended and Prohibited Use +- Intended use: +- Prohibited use: + +## Architecture and Attack Surface +- Interfaces (API/UI/agents/tools): +- Trust boundaries: +- High-value assets: + +## Controls +- Preventive controls: +- Detective controls: +- Corrective controls: + +## Test Coverage +- Categories tested: +- Latest validation date: +- Known coverage gaps: + +## Open Risks +- Risk description: +- Severity: +- Compensating controls: +- Target remediation date: + +## Incident Readiness +- On-call owner: +- Escalation path: +- Runbook link: diff --git a/templates/rules-of-engagement-template.md b/templates/rules-of-engagement-template.md new file mode 100644 index 0000000..e87f079 --- /dev/null +++ b/templates/rules-of-engagement-template.md @@ -0,0 +1,30 @@ +# Red Team Rules of Engagement (Template) + +## Scope +- In scope: +- Out of scope: + +## Authorized Techniques +- Allowed: +- Prohibited: + +## Safety Guardrails +- No production data export +- Rate limits and resource ceilings +- Mandatory stop conditions + +## Escalation and Notification +- Critical finding notification SLA: +- Security contact: +- Legal/compliance contact: + +## Data Handling +- Data classes used: +- Retention period: +- Deletion and evidence procedures: + +## Sign-off +- Red Team Lead: +- System Owner: +- Security Lead: +- Legal/Compliance: diff --git a/templates/stakeholder-readout-outline.md b/templates/stakeholder-readout-outline.md new file mode 100644 index 0000000..d7a7a3e --- /dev/null +++ b/templates/stakeholder-readout-outline.md @@ -0,0 +1,32 @@ +# Stakeholder Readout Outline (AI Red Teaming) + +## 1. Executive Summary +- Top risks discovered +- Current risk posture trend +- Decision requests for leadership + +## 2. Engagement Scope +- Systems and versions tested +- Timeframe and constraints +- Threat assumptions + +## 3. Key Findings +- Critical/high findings summary +- Notable exploit chains +- Residual risk after mitigation + +## 4. Metrics Dashboard +- ASR by category +- Recurrence rate +- Time-to-fix trend +- Control coverage + +## 5. Action Plan +- Immediate (0-30 days) +- Near-term (31-90 days) +- Strategic (90+ days) + +## 6. Appendix +- Methodology +- Evidence and confidence levels +- Open questions diff --git a/templates/test-case-library-starter.md b/templates/test-case-library-starter.md new file mode 100644 index 0000000..19e55bc --- /dev/null +++ b/templates/test-case-library-starter.md @@ -0,0 +1,26 @@ +# Test Case Library Starter Pack + +## Naming Convention +`--` + +## Required Metadata Per Test +- Test ID +- Category (prompt injection/jailbreak/data leakage/etc.) +- Risk tier (critical/high/medium/low) +- Target component (model, retrieval, tool, orchestrator) +- Language/locale +- Expected policy outcome +- Last validated date + +## Starter Categories +1. Prompt injection (direct/indirect) +2. Jailbreak (single-turn/multi-turn) +3. Data leakage (PII/training-data exposure) +4. Tool misuse (agentic) +5. Memory poisoning (agentic) +6. Cross-tenant isolation checks + +## Regression Policy +- Critical/high tests run on every PR +- Full suite run on release branches +- Failed tests require linked mitigation issue diff --git a/templates/vulnerability-report-template.md b/templates/vulnerability-report-template.md new file mode 100644 index 0000000..058048e --- /dev/null +++ b/templates/vulnerability-report-template.md @@ -0,0 +1,38 @@ +# AI Vulnerability Report (Template) + +## Finding Metadata +- Finding ID: +- Date discovered: +- Reporter: +- Affected system/version: + +## Severity and Risk +- Severity (Critical/High/Medium/Low): +- Exploitability: +- User impact: +- Autonomy factor: +- Blast radius: +- Recoverability: + +## Reproduction +- Preconditions: +- Step-by-step reproduction: +- Proof of concept: + +## Impact +- Security/privacy/safety impact: +- Business impact: + +## Root Cause +- Control(s) bypassed: +- Why mitigation failed: + +## Remediation +- Immediate containment: +- Long-term fix: +- Owner: +- Target date: + +## Validation +- Regression test case ID: +- Validation status/date: