From b1476087cd33aa77a42748f50ca76bb227a239ab Mon Sep 17 00:00:00 2001 From: Tarique smith Date: Thu, 19 Feb 2026 16:58:45 -0500 Subject: [PATCH] Implement remaining guide recommendations with governance and templates --- .github/workflows/ai-redteam-regression.yml | 34 ++ CHANGELOG.md | 33 ++ GUIDE_ENHANCEMENT_RECOMMENDATIONS.md | 94 ++++++ README.md | 337 ++++++++++++++++++++ resources-validation.md | 15 + templates/ai-security-pr-checklist.md | 10 + templates/case-study-template.md | 34 ++ templates/model-system-security-card.md | 37 +++ templates/rules-of-engagement-template.md | 30 ++ templates/stakeholder-readout-outline.md | 32 ++ templates/test-case-library-starter.md | 26 ++ templates/threat-modeling-workshop.md | 34 ++ templates/vulnerability-report-template.md | 38 +++ 13 files changed, 754 insertions(+) create mode 100644 .github/workflows/ai-redteam-regression.yml create mode 100644 CHANGELOG.md create mode 100644 GUIDE_ENHANCEMENT_RECOMMENDATIONS.md create mode 100644 resources-validation.md create mode 100644 templates/ai-security-pr-checklist.md create mode 100644 templates/case-study-template.md create mode 100644 templates/model-system-security-card.md create mode 100644 templates/rules-of-engagement-template.md create mode 100644 templates/stakeholder-readout-outline.md create mode 100644 templates/test-case-library-starter.md create mode 100644 templates/threat-modeling-workshop.md create mode 100644 templates/vulnerability-report-template.md diff --git a/.github/workflows/ai-redteam-regression.yml b/.github/workflows/ai-redteam-regression.yml new file mode 100644 index 0000000..3dece2f --- /dev/null +++ b/.github/workflows/ai-redteam-regression.yml @@ -0,0 +1,34 @@ +name: AI Red Team Regression + +on: + pull_request: + push: + branches: [ main ] + +jobs: + redteam-regression: + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@v4 + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: '3.11' + + - name: Install dependencies + run: | + python -m pip install --upgrade pip + pip install garak + + - name: Run baseline security scan (example) + run: | + mkdir -p reports + python -m garak --model_type openai --model_name gpt-4o-mini --report_prefix reports/garak || true + + - name: Upload reports + uses: actions/upload-artifact@v4 + with: + name: ai-redteam-reports + path: reports/ diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..cbe8322 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,33 @@ +# Changelog + +All notable changes to this guide should be documented in this file. + +## [Unreleased] +### Added +- Operational implementation sections in README: + - Implementation Quickstart (30/60/90) + - Evaluation Harness (Reference Implementation) + - Agentic AI Attack Trees + Controls Mapping + - AI Harm Severity and Triage Model + - Secure SDLC Integration Artifacts + - Defensive Architecture Patterns + - Multilingual & Cultural Safety Playbook + - Data Governance for Red Teaming + - Metrics That Matter (and Anti-Metrics) + - Purple Team Operations + - Common Implementation Pitfalls + - Case Study Quality Bar + - Model & System Cards for Security Posture + - Source Hygiene & Update Governance + - Practitioner Appendices +- New templates in `templates/`: + - threat-modeling-workshop.md + - ai-security-pr-checklist.md + - rules-of-engagement-template.md + - vulnerability-report-template.md + - test-case-library-starter.md + - stakeholder-readout-outline.md + - model-system-security-card.md + - case-study-template.md +- `resources-validation.md` to track external source freshness. +- `.github/workflows/ai-redteam-regression.yml` baseline CI workflow. diff --git a/GUIDE_ENHANCEMENT_RECOMMENDATIONS.md b/GUIDE_ENHANCEMENT_RECOMMENDATIONS.md new file mode 100644 index 0000000..3b74f98 --- /dev/null +++ b/GUIDE_ENHANCEMENT_RECOMMENDATIONS.md @@ -0,0 +1,94 @@ +# AI Red Teaming Guide: Recommended Enhancements + +This document summarizes high-impact additions that would make the guide more actionable, easier to operationalize, and more maintainable over time. + +## 1) Add an "Executive Quickstart" (30/60/90-day plan) +- Include role-specific quickstarts for startup, mid-size, and enterprise teams. +- Provide first-week tasks, first red-team exercise template, and basic metrics dashboard. +- Benefit: helps readers move from theory to execution faster. + +## 2) Add a full "Threat Modeling Workshop" template +- Add a facilitator script, pre-read checklist, and output artifacts (risk register, prioritized test plan). +- Include sample architecture diagrams and data-flow examples for LLM apps, RAG, and agents. +- Benefit: standardizes planning quality across teams. + +## 3) Expand "Agentic AI" section with attack trees and controls mapping +- Add concrete attack trees for tool-use abuse, memory poisoning, and inter-agent privilege escalation. +- Map each attack path to preventive, detective, and corrective controls. +- Benefit: closes the gap between conceptual threats and implementation. + +## 4) Add "Evaluation Harness" reference implementation +- Provide a minimal reproducible folder structure for prompt sets, expected policy outcomes, and scoring scripts. +- Include examples for ASR, false positive/negative rates, and regression tracking. +- Benefit: enables repeatable benchmarking and CI integration. + +## 5) Add severity + triage model tailored to AI harms +- Extend CVSS-like scoring with AI-specific dimensions (scale, autonomy, recoverability, user impact). +- Provide triage SLAs and remediation ownership guidance. +- Benefit: improves prioritization consistency and executive reporting. + +## 6) Add "Defensive Architecture Patterns" section +- Include secure prompt orchestration, policy-as-code checks, tool permissioning, sandboxing, and output guardrails. +- Add reference diagrams showing where to enforce controls in request/response pipelines. +- Benefit: gives builders prescriptive designs, not only attack descriptions. + +## 7) Add "Multilingual & Cultural Safety" testing playbook +- Provide test set design guidance for low-resource languages and region-specific harm categories. +- Include translation-loop and mixed-language bypass tests. +- Benefit: strengthens global deployment readiness. + +## 8) Add "Data Governance for Red Teaming" guidance +- Define safe handling for prompts, logs, PII, and model outputs during testing. +- Include retention rules, anonymization procedures, and legal review checkpoints. +- Benefit: reduces compliance risk while testing aggressively. + +## 9) Add "Metrics that matter" section with anti-metrics +- Keep ASR but add risk-reduction metrics: exploit recurrence, time-to-fix, residual risk trend, control coverage. +- Add anti-metrics to avoid (e.g., raw test-count vanity metrics). +- Benefit: shifts focus from activity to risk reduction. + +## 10) Add a "Purple Team Operations" chapter +- Include collaboration workflows between red team, detection engineering, and incident response. +- Provide playbooks for converting red-team findings into detections and runbooks. +- Benefit: better organizational learning and faster hardening. + +## 11) Add "Case study quality bar" and normalized template +- Standardize every case study with context, exploit chain, root cause, controls bypassed, cost to remediate, and lessons. +- Add a confidence level and evidence source for each claim. +- Benefit: improves credibility and cross-case comparability. + +## 12) Add "Common implementation pitfalls" section +- Examples: over-reliance on keyword blocking, missing tool authorization boundaries, lack of regression suites. +- Include β€œwhat good looks like” alternatives. +- Benefit: helps practitioners avoid known traps. + +## 13) Add "Secure SDLC integration" artifacts +- Provide PR checklist, release gate criteria, and production monitoring runbook for AI-specific security. +- Include sample GitHub Actions pipelines for red-team regression tests. +- Benefit: embeds red teaming into delivery workflows. + +## 14) Add "Model and system cards" for security posture +- Provide templates for documenting attack surface, tested risks, residual risks, and operational guardrails. +- Benefit: improves transparency for internal governance and audits. + +## 15) Add source hygiene and update governance +- Introduce a versioned changelog and "last validated" date per external tool/resource. +- Mark claims as "evidence-backed" vs "expert guidance". +- Benefit: keeps a long guide trustworthy as the ecosystem changes quickly. + +## 16) Add practitioner appendices +- Red-team rules of engagement template (editable). +- Vulnerability report template. +- Test-case library starter pack. +- Stakeholder readout slide outline. +- Benefit: reduces startup friction and increases consistency. + +## Suggested prioritization (highest first) +1. Executive Quickstart +2. Evaluation Harness reference implementation +3. Agentic AI attack trees + controls mapping +4. Severity/triage model for AI harms +5. Secure SDLC integration artifacts +6. Data governance for red teaming +7. Multilingual & cultural safety playbook +8. Purple team operations diff --git a/README.md b/README.md index 9b514b5..1baef54 100644 --- a/README.md +++ b/README.md @@ -32,6 +32,21 @@ - [Real-World Case Studies](#real-world-case-studies) - [Building Your Red Team](#building-your-red-team) - [Best Practices](#best-practices) +- [Implementation Quickstart (30/60/90)](#implementation-quickstart-306090) +- [Evaluation Harness (Reference Implementation)](#evaluation-harness-reference-implementation) +- [Agentic AI Attack Trees + Controls Mapping](#agentic-ai-attack-trees--controls-mapping) +- [AI Harm Severity and Triage Model](#ai-harm-severity-and-triage-model) +- [Secure SDLC Integration Artifacts](#secure-sdlc-integration-artifacts) +- [Defensive Architecture Patterns](#defensive-architecture-patterns) +- [Multilingual & Cultural Safety Playbook](#multilingual--cultural-safety-playbook) +- [Data Governance for Red Teaming](#data-governance-for-red-teaming) +- [Metrics That Matter (and Anti-Metrics)](#metrics-that-matter-and-anti-metrics) +- [Purple Team Operations](#purple-team-operations) +- [Common Implementation Pitfalls](#common-implementation-pitfalls) +- [Case Study Quality Bar](#case-study-quality-bar) +- [Model & System Cards for Security Posture](#model--system-cards-for-security-posture) +- [Source Hygiene & Update Governance](#source-hygiene--update-governance) +- [Practitioner Appendices](#practitioner-appendices) - [Regulatory Compliance](#regulatory-compliance) - [Resources and References](#resources-and-references) @@ -1738,6 +1753,328 @@ Red team members should feel comfortable: --- + +## πŸš€ Implementation Quickstart (30/60/90) + +Use this phased plan to turn guidance into an operating program. + +### First 30 Days (Foundation) +- Define system scope, stakeholders, and crown-jewel assets +- Run a 2-hour threat modeling workshop (use `templates/threat-modeling-workshop.md`) +- Create an initial attack library with at least: + - 25 prompt injection tests + - 25 jailbreak tests + - 10 data leakage tests +- Establish baseline metrics: ASR, critical/high count, time-to-triage + +### Days 31-60 (Operationalization) +- Implement weekly automated red-team regression in CI +- Add manual deep-dive sessions for top 3 business-critical scenarios +- Define triage SLA by severity (Critical/High/Medium/Low) +- Stand up a shared red-team findings board with remediation owners + +### Days 61-90 (Scale) +- Add multilingual and multi-turn attack suites +- Add agentic AI abuse tests (tool misuse, memory poisoning, permissions) +- Launch monthly purple-team exercise with detection and IR teams +- Publish quarterly security posture report with residual risk trends + +--- + +## πŸ§ͺ Evaluation Harness (Reference Implementation) + +A lightweight structure for repeatable red-teaming and regression tracking: + +``` +security-evals/ +β”œβ”€β”€ prompts/ +β”‚ β”œβ”€β”€ prompt_injection.csv +β”‚ β”œβ”€β”€ jailbreaks.csv +β”‚ └── data_leakage.csv +β”œβ”€β”€ policies/ +β”‚ └── expected_outcomes.yaml +β”œβ”€β”€ scorers/ +β”‚ β”œβ”€β”€ policy_violation.py +β”‚ └── leakage_detector.py +β”œβ”€β”€ reports/ +β”‚ β”œβ”€β”€ latest.json +β”‚ └── trend.csv +└── run_eval.py +``` + +### Minimum Scoring Set +- **ASR** by attack category +- **False positives/negatives** for moderation and detection controls +- **Exploit recurrence rate** after mitigation +- **Time-to-fix** and **time-to-verify** + +### Release Gates (Suggested) +- Block release if: + - Any **Critical** issue is open + - ASR for high-risk category > 5% + - Regression introduces > 20% ASR increase in any tracked class + +--- + +## πŸ•ΈοΈ Agentic AI Attack Trees + Controls Mapping + +Use attack trees to connect offensive testing paths to defensive controls. + +### Attack Tree A: Tool Misuse +1. Inject hidden instruction into user-supplied content +2. Agent adopts malicious instruction priority +3. Agent invokes high-privilege tool +4. Agent executes unsafe action + +**Controls:** +- Preventive: tool allowlists, scoped API tokens, policy checks pre-execution +- Detective: anomalous tool-call monitoring, high-risk action alerts +- Corrective: transaction rollback, credential rotation, incident playbook + +### Attack Tree B: Memory Poisoning +1. Adversary plants false memory artifact +2. Agent persists poisoned state +3. Subsequent sessions trust manipulated context +4. Agent behavior drifts into unsafe decisions + +**Controls:** +- Preventive: memory write policies, source trust labels, TTL for memory items +- Detective: memory integrity diffs, unusual memory mutation alerts +- Corrective: memory quarantine/reset, retrospective impact analysis + +### Attack Tree C: Inter-Agent Privilege Escalation +1. Compromise low-privilege agent with prompt injection +2. Lateral instruction passing to orchestrator +3. Orchestrator executes action outside original permission boundary +4. Expanded access leads to data exfiltration or sabotage + +**Controls:** +- Preventive: identity-bound inter-agent authz, least-privilege role boundaries +- Detective: cross-agent call graph anomaly detection +- Corrective: isolate compromised agent, revoke delegated capabilities + +--- + +## πŸ“ˆ AI Harm Severity and Triage Model + +Use CVSS as a base, then add AI-specific modifiers: + +| Dimension | Description | Scale | +|-----------|-------------|-------| +| **Exploitability** | How easy the issue is to reproduce | Low/Med/High | +| **User Impact** | Potential harm to users or protected groups | Low/Med/High/Critical | +| **Autonomy Factor** | Can agents execute actions without human confirmation? | None/Partial/Full | +| **Blast Radius** | Single user, tenant, or cross-tenant/system-wide | Narrow/Broad/Systemic | +| **Recoverability** | Time/effort to safely restore expected behavior | Easy/Moderate/Hard | + +### Triage SLA (Suggested) +- **Critical**: acknowledge immediately, mitigate within 24 hours +- **High**: acknowledge within 4 hours, mitigate within 7 days +- **Medium**: mitigate within 30 days +- **Low**: backlog with risk acceptance + review date + +--- + +## 🧩 Secure SDLC Integration Artifacts + +To reduce "one-off" testing, integrate red-team controls into delivery workflows. + +### PR Security Checklist (AI Systems) +- [ ] Threat model updated for new capabilities/tools +- [ ] New prompts/flows added to evaluation harness +- [ ] High-risk tool actions require explicit authorization checks +- [ ] Logging and privacy controls validated +- [ ] Residual risks documented in system card + +### Release Readiness Criteria +- No open Critical findings +- All High findings have approved mitigation or documented exception +- Regression suite passes for required attack categories +- Monitoring/detection rules deployed for new features + +### Operational Runbook Triggers +- Sudden ASR spike (>2x baseline) +- New jailbreak family with repeat success +- Evidence of cross-tenant leakage or autonomous unsafe tool use + + + +## πŸ›‘οΈ Defensive Architecture Patterns + +Translate red-team findings into architecture decisions using a layered control model: + +### Reference Pipeline +``` +User Input + -> Input normalization/sanitization + -> Policy-as-code pre-checks + -> Prompt orchestration with role boundaries + -> Retrieval/tool authorization gates + -> Model inference + -> Output policy and leakage filters + -> Human-in-the-loop (for high-risk actions) + -> Logging, telemetry, and audit trail +``` + +### Core Patterns +1. **Secure Prompt Orchestration** + - Separate system, developer, and user instructions + - Prevent untrusted content from altering control prompts + +2. **Tool Permissioning and Isolation** + - Grant least-privilege tokens per tool and per action + - Use approval workflows for sensitive actions (payments, credential resets) + +3. **Policy-as-Code Enforcement** + - Implement deterministic checks before tool execution + - Version policies and test them in CI alongside prompts + +4. **Output Guardrails** + - Add layered filters (policy, PII, compliance) + - Require citations for high-stakes domains where applicable + +--- + +## 🌍 Multilingual & Cultural Safety Playbook + +### Test Set Design +- Cover top business languages + low-resource languages in your user base +- Include region-specific harmful-content categories and local legal constraints +- Add culturally sensitive edge cases (slang, euphemisms, coded hate terms) + +### Required Test Patterns +- **Translation-loop bypass**: blocked request translated across 2+ languages +- **Mixed-language prompt injection**: instructions split across languages/scripts +- **Code-switching attacks**: alternating dialect/locale variants per turn +- **Contextual harm variance**: same request across regions with different norms + +### Reporting Requirements +- Record language, locale, and script for every failure +- Track ASR by language family to identify uneven safety coverage +- Prioritize mitigation where user impact and language penetration are highest + +--- + +## πŸ—‚οΈ Data Governance for Red Teaming + +### Data Classes in Scope +- Prompts and conversational logs +- Retrieved documents and memory artifacts +- Model outputs (including blocked/flagged outputs) +- Metadata containing user identifiers or tenant references + +### Handling Rules (Baseline) +- Minimize data collection to testing necessity +- Pseudonymize/anonymize PII before long-term storage +- Encrypt findings repositories and restrict access by role +- Define retention windows per data class (e.g., 30/90/365 days) +- Run legal/compliance review for regulated environments + +### Governance Checkpoints +- Pre-engagement data handling approval +- Mid-engagement privacy compliance review +- Post-engagement purge and evidence retention sign-off + +--- + +## πŸ“Š Metrics That Matter (and Anti-Metrics) + +### Outcome Metrics (Use) +- **ASR by risk category** (not only aggregate ASR) +- **Exploit recurrence rate** after fixes +- **Median time-to-fix** by severity +- **Residual risk trend** by quarter +- **Control coverage** across high-risk abuse paths + +### Anti-Metrics (Avoid) +- Raw number of tests executed without risk weighting +- Total vulnerabilities found as a standalone success metric +- Single-point benchmark scores without trend context +- β€œPass rate” without confidence interval/sample-size disclosure + +--- + +## 🟣 Purple Team Operations + +### Operating Cadence +1. Red team identifies exploit chain and reproduction steps +2. Detection engineering maps telemetry and creates detections +3. Incident response drafts/updates response runbook +4. Product and platform teams ship mitigations +5. Purple-team replay validates detection + containment effectiveness + +### Required Outputs +- Detection rule specifications linked to finding IDs +- Incident runbooks for top critical/high abuse paths +- Post-exercise retro: what failed, what improved, what's next + +--- + +## ⚠️ Common Implementation Pitfalls + +| Pitfall | Why It Fails | What Good Looks Like | +|--------|---------------|----------------------| +| Keyword-only blocking | Easy to bypass via encoding/obfuscation | Semantic + policy layered controls | +| Over-trusting agent tools | Enables privilege escalation | Strong authz checks per tool action | +| One-time red team exercise | Misses drift and regressions | Recurring automated + manual cadence | +| Tracking only aggregate ASR | Hides high-risk hotspots | Risk-tiered metrics and trends | +| No regression suite | Reintroduces old vulnerabilities | Versioned attack library in CI | + +--- + +## 🧾 Case Study Quality Bar + +Use a normalized template for all future case studies: +- System context and business criticality +- Attack chain with reproducible steps +- Root cause and control failure points +- Severity and estimated remediation effort +- Evidence quality tag (**Evidence-backed** or **Expert guidance**) +- Confidence level (High/Medium/Low) +- Lessons learned and prevention actions + +Template available: `templates/case-study-template.md` + +--- + +## πŸͺͺ Model & System Cards for Security Posture + +Document security posture using a structured card for every production AI system: +- Intended use and prohibited use +- Attack surface summary +- Tested risk categories and latest validation date +- Open risks and compensating controls +- Incident escalation owners and contacts + +Template available: `templates/model-system-security-card.md` + +--- + +## πŸ”„ Source Hygiene & Update Governance + +### Governance Practices +- Maintain a versioned changelog for the guide (`CHANGELOG.md`) +- Track external references with "last validated" timestamps +- Mark major claims as **Evidence-backed** or **Expert guidance** +- Run a quarterly review for stale links/tools/framework updates + +Reference index available: `resources-validation.md` + +--- + +## πŸ“Ž Practitioner Appendices + +Starter artifacts in `templates/`: +- `threat-modeling-workshop.md` +- `ai-security-pr-checklist.md` +- `rules-of-engagement-template.md` +- `vulnerability-report-template.md` +- `test-case-library-starter.md` +- `stakeholder-readout-outline.md` +- `model-system-security-card.md` +- `case-study-template.md` + + ## πŸ“‹ Regulatory Compliance ### United States diff --git a/resources-validation.md b/resources-validation.md new file mode 100644 index 0000000..a2fd632 --- /dev/null +++ b/resources-validation.md @@ -0,0 +1,15 @@ +# External Resources Validation Index + +Track major external references to keep this guide current. + +| Resource | Type | Last Validated | Evidence Tag | Notes | +|----------|------|----------------|--------------|-------| +| NIST AI RMF | Framework | 2026-02-19 | Evidence-backed | Core governance reference | +| OWASP GenAI Guide | Framework | 2026-02-19 | Evidence-backed | Practical LLM testing guidance | +| MITRE ATLAS | Framework | 2026-02-19 | Evidence-backed | Tactics and techniques mapping | +| CSA Agentic AI Guide | Framework | 2026-02-19 | Evidence-backed | Agentic-specific threat coverage | + +## Update Process +1. Validate links and publication status quarterly. +2. Update `Last Validated` dates. +3. Mark major additions as Evidence-backed or Expert guidance. diff --git a/templates/ai-security-pr-checklist.md b/templates/ai-security-pr-checklist.md new file mode 100644 index 0000000..bdf68ea --- /dev/null +++ b/templates/ai-security-pr-checklist.md @@ -0,0 +1,10 @@ +# AI Security PR Checklist + +- [ ] Threat model updated if behavior/capability changed +- [ ] New or modified prompts added to security regression suite +- [ ] Tool authorization boundary validated (least privilege) +- [ ] Prompt injection and jailbreak tests executed for changed flows +- [ ] Data handling reviewed for PII/log retention requirements +- [ ] Output filtering and policy checks validated +- [ ] Monitoring/detection rules updated for new failure modes +- [ ] Residual risks documented in model/system card diff --git a/templates/case-study-template.md b/templates/case-study-template.md new file mode 100644 index 0000000..a315ed1 --- /dev/null +++ b/templates/case-study-template.md @@ -0,0 +1,34 @@ +# AI Red Team Case Study Template + +## Context +- System description: +- Business criticality: +- Deployment context: + +## Attack Chain +1. +2. +3. + +## Finding Details +- Vulnerability class: +- Root cause: +- Controls bypassed: +- Severity: + +## Impact and Cost +- User/business impact: +- Estimated remediation effort: + +## Evidence and Confidence +- Evidence quality: Evidence-backed / Expert guidance +- Confidence level: High / Medium / Low +- Sources: + +## Remediation and Validation +- Immediate mitigation: +- Long-term control improvements: +- Regression test coverage added: + +## Lessons Learned +- What changed in process/architecture: diff --git a/templates/model-system-security-card.md b/templates/model-system-security-card.md new file mode 100644 index 0000000..a4ce753 --- /dev/null +++ b/templates/model-system-security-card.md @@ -0,0 +1,37 @@ +# Model/System Security Card + +## System Identity +- Name: +- Owner: +- Environment: +- Last updated: + +## Intended and Prohibited Use +- Intended use: +- Prohibited use: + +## Architecture and Attack Surface +- Interfaces (API/UI/agents/tools): +- Trust boundaries: +- High-value assets: + +## Controls +- Preventive controls: +- Detective controls: +- Corrective controls: + +## Test Coverage +- Categories tested: +- Latest validation date: +- Known coverage gaps: + +## Open Risks +- Risk description: +- Severity: +- Compensating controls: +- Target remediation date: + +## Incident Readiness +- On-call owner: +- Escalation path: +- Runbook link: diff --git a/templates/rules-of-engagement-template.md b/templates/rules-of-engagement-template.md new file mode 100644 index 0000000..e87f079 --- /dev/null +++ b/templates/rules-of-engagement-template.md @@ -0,0 +1,30 @@ +# Red Team Rules of Engagement (Template) + +## Scope +- In scope: +- Out of scope: + +## Authorized Techniques +- Allowed: +- Prohibited: + +## Safety Guardrails +- No production data export +- Rate limits and resource ceilings +- Mandatory stop conditions + +## Escalation and Notification +- Critical finding notification SLA: +- Security contact: +- Legal/compliance contact: + +## Data Handling +- Data classes used: +- Retention period: +- Deletion and evidence procedures: + +## Sign-off +- Red Team Lead: +- System Owner: +- Security Lead: +- Legal/Compliance: diff --git a/templates/stakeholder-readout-outline.md b/templates/stakeholder-readout-outline.md new file mode 100644 index 0000000..d7a7a3e --- /dev/null +++ b/templates/stakeholder-readout-outline.md @@ -0,0 +1,32 @@ +# Stakeholder Readout Outline (AI Red Teaming) + +## 1. Executive Summary +- Top risks discovered +- Current risk posture trend +- Decision requests for leadership + +## 2. Engagement Scope +- Systems and versions tested +- Timeframe and constraints +- Threat assumptions + +## 3. Key Findings +- Critical/high findings summary +- Notable exploit chains +- Residual risk after mitigation + +## 4. Metrics Dashboard +- ASR by category +- Recurrence rate +- Time-to-fix trend +- Control coverage + +## 5. Action Plan +- Immediate (0-30 days) +- Near-term (31-90 days) +- Strategic (90+ days) + +## 6. Appendix +- Methodology +- Evidence and confidence levels +- Open questions diff --git a/templates/test-case-library-starter.md b/templates/test-case-library-starter.md new file mode 100644 index 0000000..19e55bc --- /dev/null +++ b/templates/test-case-library-starter.md @@ -0,0 +1,26 @@ +# Test Case Library Starter Pack + +## Naming Convention +`--` + +## Required Metadata Per Test +- Test ID +- Category (prompt injection/jailbreak/data leakage/etc.) +- Risk tier (critical/high/medium/low) +- Target component (model, retrieval, tool, orchestrator) +- Language/locale +- Expected policy outcome +- Last validated date + +## Starter Categories +1. Prompt injection (direct/indirect) +2. Jailbreak (single-turn/multi-turn) +3. Data leakage (PII/training-data exposure) +4. Tool misuse (agentic) +5. Memory poisoning (agentic) +6. Cross-tenant isolation checks + +## Regression Policy +- Critical/high tests run on every PR +- Full suite run on release branches +- Failed tests require linked mitigation issue diff --git a/templates/threat-modeling-workshop.md b/templates/threat-modeling-workshop.md new file mode 100644 index 0000000..7e80e90 --- /dev/null +++ b/templates/threat-modeling-workshop.md @@ -0,0 +1,34 @@ +# Threat Modeling Workshop Template (AI Systems) + +## Workshop Goals +- Identify highest-risk abuse paths for the AI system. +- Prioritize red-team test scenarios by business impact and exploitability. +- Assign owners and due dates for controls and mitigations. + +## Participants +- Product owner +- AI/ML engineer +- Security engineer / red team +- Detection / SOC representative +- Legal / compliance (for high-risk domains) + +## Pre-Read Checklist +- Architecture diagram and trust boundaries +- Data flow (inputs, retrieval, tools, outputs) +- List of model capabilities and enabled tools/actions +- Existing guardrails (input/output/content/policy) +- Known incidents or prior findings + +## 120-Minute Agenda +1. Scope and assumptions (15 min) +2. System walkthrough (20 min) +3. Threat brainstorming (30 min) +4. Risk scoring and prioritization (30 min) +5. Test plan and owners (20 min) +6. Wrap-up and next steps (5 min) + +## Output Artifacts +- Prioritized risk register +- Red-team test plan for next sprint +- Detection/monitoring gaps backlog +- Signed-off risk acceptance for deferred items diff --git a/templates/vulnerability-report-template.md b/templates/vulnerability-report-template.md new file mode 100644 index 0000000..058048e --- /dev/null +++ b/templates/vulnerability-report-template.md @@ -0,0 +1,38 @@ +# AI Vulnerability Report (Template) + +## Finding Metadata +- Finding ID: +- Date discovered: +- Reporter: +- Affected system/version: + +## Severity and Risk +- Severity (Critical/High/Medium/Low): +- Exploitability: +- User impact: +- Autonomy factor: +- Blast radius: +- Recoverability: + +## Reproduction +- Preconditions: +- Step-by-step reproduction: +- Proof of concept: + +## Impact +- Security/privacy/safety impact: +- Business impact: + +## Root Cause +- Control(s) bypassed: +- Why mitigation failed: + +## Remediation +- Immediate containment: +- Long-term fix: +- Owner: +- Target date: + +## Validation +- Regression test case ID: +- Validation status/date: