From d1c7e4cda5a91b544c51ef3dfdcb960bb5ba39ea Mon Sep 17 00:00:00 2001 From: Tarique smith Date: Thu, 19 Feb 2026 16:19:42 -0500 Subject: [PATCH] Implement initial operational red-teaming recommendations --- .github/workflows/ai-redteam-regression.yml | 34 +++++ GUIDE_ENHANCEMENT_RECOMMENDATIONS.md | 94 ++++++++++++ README.md | 150 ++++++++++++++++++++ templates/ai-security-pr-checklist.md | 10 ++ templates/threat-modeling-workshop.md | 34 +++++ 5 files changed, 322 insertions(+) create mode 100644 .github/workflows/ai-redteam-regression.yml create mode 100644 GUIDE_ENHANCEMENT_RECOMMENDATIONS.md create mode 100644 templates/ai-security-pr-checklist.md create mode 100644 templates/threat-modeling-workshop.md diff --git a/.github/workflows/ai-redteam-regression.yml b/.github/workflows/ai-redteam-regression.yml new file mode 100644 index 0000000..3dece2f --- /dev/null +++ b/.github/workflows/ai-redteam-regression.yml @@ -0,0 +1,34 @@ +name: AI Red Team Regression + +on: + pull_request: + push: + branches: [ main ] + +jobs: + redteam-regression: + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@v4 + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: '3.11' + + - name: Install dependencies + run: | + python -m pip install --upgrade pip + pip install garak + + - name: Run baseline security scan (example) + run: | + mkdir -p reports + python -m garak --model_type openai --model_name gpt-4o-mini --report_prefix reports/garak || true + + - name: Upload reports + uses: actions/upload-artifact@v4 + with: + name: ai-redteam-reports + path: reports/ diff --git a/GUIDE_ENHANCEMENT_RECOMMENDATIONS.md b/GUIDE_ENHANCEMENT_RECOMMENDATIONS.md new file mode 100644 index 0000000..3b74f98 --- /dev/null +++ b/GUIDE_ENHANCEMENT_RECOMMENDATIONS.md @@ -0,0 +1,94 @@ +# AI Red Teaming Guide: Recommended Enhancements + +This document summarizes high-impact additions that would make the guide more actionable, easier to operationalize, and more maintainable over time. + +## 1) Add an "Executive Quickstart" (30/60/90-day plan) +- Include role-specific quickstarts for startup, mid-size, and enterprise teams. +- Provide first-week tasks, first red-team exercise template, and basic metrics dashboard. +- Benefit: helps readers move from theory to execution faster. + +## 2) Add a full "Threat Modeling Workshop" template +- Add a facilitator script, pre-read checklist, and output artifacts (risk register, prioritized test plan). +- Include sample architecture diagrams and data-flow examples for LLM apps, RAG, and agents. +- Benefit: standardizes planning quality across teams. + +## 3) Expand "Agentic AI" section with attack trees and controls mapping +- Add concrete attack trees for tool-use abuse, memory poisoning, and inter-agent privilege escalation. +- Map each attack path to preventive, detective, and corrective controls. +- Benefit: closes the gap between conceptual threats and implementation. + +## 4) Add "Evaluation Harness" reference implementation +- Provide a minimal reproducible folder structure for prompt sets, expected policy outcomes, and scoring scripts. +- Include examples for ASR, false positive/negative rates, and regression tracking. +- Benefit: enables repeatable benchmarking and CI integration. + +## 5) Add severity + triage model tailored to AI harms +- Extend CVSS-like scoring with AI-specific dimensions (scale, autonomy, recoverability, user impact). +- Provide triage SLAs and remediation ownership guidance. +- Benefit: improves prioritization consistency and executive reporting. + +## 6) Add "Defensive Architecture Patterns" section +- Include secure prompt orchestration, policy-as-code checks, tool permissioning, sandboxing, and output guardrails. +- Add reference diagrams showing where to enforce controls in request/response pipelines. +- Benefit: gives builders prescriptive designs, not only attack descriptions. + +## 7) Add "Multilingual & Cultural Safety" testing playbook +- Provide test set design guidance for low-resource languages and region-specific harm categories. +- Include translation-loop and mixed-language bypass tests. +- Benefit: strengthens global deployment readiness. + +## 8) Add "Data Governance for Red Teaming" guidance +- Define safe handling for prompts, logs, PII, and model outputs during testing. +- Include retention rules, anonymization procedures, and legal review checkpoints. +- Benefit: reduces compliance risk while testing aggressively. + +## 9) Add "Metrics that matter" section with anti-metrics +- Keep ASR but add risk-reduction metrics: exploit recurrence, time-to-fix, residual risk trend, control coverage. +- Add anti-metrics to avoid (e.g., raw test-count vanity metrics). +- Benefit: shifts focus from activity to risk reduction. + +## 10) Add a "Purple Team Operations" chapter +- Include collaboration workflows between red team, detection engineering, and incident response. +- Provide playbooks for converting red-team findings into detections and runbooks. +- Benefit: better organizational learning and faster hardening. + +## 11) Add "Case study quality bar" and normalized template +- Standardize every case study with context, exploit chain, root cause, controls bypassed, cost to remediate, and lessons. +- Add a confidence level and evidence source for each claim. +- Benefit: improves credibility and cross-case comparability. + +## 12) Add "Common implementation pitfalls" section +- Examples: over-reliance on keyword blocking, missing tool authorization boundaries, lack of regression suites. +- Include β€œwhat good looks like” alternatives. +- Benefit: helps practitioners avoid known traps. + +## 13) Add "Secure SDLC integration" artifacts +- Provide PR checklist, release gate criteria, and production monitoring runbook for AI-specific security. +- Include sample GitHub Actions pipelines for red-team regression tests. +- Benefit: embeds red teaming into delivery workflows. + +## 14) Add "Model and system cards" for security posture +- Provide templates for documenting attack surface, tested risks, residual risks, and operational guardrails. +- Benefit: improves transparency for internal governance and audits. + +## 15) Add source hygiene and update governance +- Introduce a versioned changelog and "last validated" date per external tool/resource. +- Mark claims as "evidence-backed" vs "expert guidance". +- Benefit: keeps a long guide trustworthy as the ecosystem changes quickly. + +## 16) Add practitioner appendices +- Red-team rules of engagement template (editable). +- Vulnerability report template. +- Test-case library starter pack. +- Stakeholder readout slide outline. +- Benefit: reduces startup friction and increases consistency. + +## Suggested prioritization (highest first) +1. Executive Quickstart +2. Evaluation Harness reference implementation +3. Agentic AI attack trees + controls mapping +4. Severity/triage model for AI harms +5. Secure SDLC integration artifacts +6. Data governance for red teaming +7. Multilingual & cultural safety playbook +8. Purple team operations diff --git a/README.md b/README.md index 9b514b5..bf5631f 100644 --- a/README.md +++ b/README.md @@ -32,6 +32,11 @@ - [Real-World Case Studies](#real-world-case-studies) - [Building Your Red Team](#building-your-red-team) - [Best Practices](#best-practices) +- [Implementation Quickstart (30/60/90)](#implementation-quickstart-306090) +- [Evaluation Harness (Reference Implementation)](#evaluation-harness-reference-implementation) +- [Agentic AI Attack Trees + Controls Mapping](#agentic-ai-attack-trees--controls-mapping) +- [AI Harm Severity and Triage Model](#ai-harm-severity-and-triage-model) +- [Secure SDLC Integration Artifacts](#secure-sdlc-integration-artifacts) - [Regulatory Compliance](#regulatory-compliance) - [Resources and References](#resources-and-references) @@ -1738,6 +1743,151 @@ Red team members should feel comfortable: --- + +## πŸš€ Implementation Quickstart (30/60/90) + +Use this phased plan to turn guidance into an operating program. + +### First 30 Days (Foundation) +- Define system scope, stakeholders, and crown-jewel assets +- Run a 2-hour threat modeling workshop (use `templates/threat-modeling-workshop.md`) +- Create an initial attack library with at least: + - 25 prompt injection tests + - 25 jailbreak tests + - 10 data leakage tests +- Establish baseline metrics: ASR, critical/high count, time-to-triage + +### Days 31-60 (Operationalization) +- Implement weekly automated red-team regression in CI +- Add manual deep-dive sessions for top 3 business-critical scenarios +- Define triage SLA by severity (Critical/High/Medium/Low) +- Stand up a shared red-team findings board with remediation owners + +### Days 61-90 (Scale) +- Add multilingual and multi-turn attack suites +- Add agentic AI abuse tests (tool misuse, memory poisoning, permissions) +- Launch monthly purple-team exercise with detection and IR teams +- Publish quarterly security posture report with residual risk trends + +--- + +## πŸ§ͺ Evaluation Harness (Reference Implementation) + +A lightweight structure for repeatable red-teaming and regression tracking: + +``` +security-evals/ +β”œβ”€β”€ prompts/ +β”‚ β”œβ”€β”€ prompt_injection.csv +β”‚ β”œβ”€β”€ jailbreaks.csv +β”‚ └── data_leakage.csv +β”œβ”€β”€ policies/ +β”‚ └── expected_outcomes.yaml +β”œβ”€β”€ scorers/ +β”‚ β”œβ”€β”€ policy_violation.py +β”‚ └── leakage_detector.py +β”œβ”€β”€ reports/ +β”‚ β”œβ”€β”€ latest.json +β”‚ └── trend.csv +└── run_eval.py +``` + +### Minimum Scoring Set +- **ASR** by attack category +- **False positives/negatives** for moderation and detection controls +- **Exploit recurrence rate** after mitigation +- **Time-to-fix** and **time-to-verify** + +### Release Gates (Suggested) +- Block release if: + - Any **Critical** issue is open + - ASR for high-risk category > 5% + - Regression introduces > 20% ASR increase in any tracked class + +--- + +## πŸ•ΈοΈ Agentic AI Attack Trees + Controls Mapping + +Use attack trees to connect offensive testing paths to defensive controls. + +### Attack Tree A: Tool Misuse +1. Inject hidden instruction into user-supplied content +2. Agent adopts malicious instruction priority +3. Agent invokes high-privilege tool +4. Agent executes unsafe action + +**Controls:** +- Preventive: tool allowlists, scoped API tokens, policy checks pre-execution +- Detective: anomalous tool-call monitoring, high-risk action alerts +- Corrective: transaction rollback, credential rotation, incident playbook + +### Attack Tree B: Memory Poisoning +1. Adversary plants false memory artifact +2. Agent persists poisoned state +3. Subsequent sessions trust manipulated context +4. Agent behavior drifts into unsafe decisions + +**Controls:** +- Preventive: memory write policies, source trust labels, TTL for memory items +- Detective: memory integrity diffs, unusual memory mutation alerts +- Corrective: memory quarantine/reset, retrospective impact analysis + +### Attack Tree C: Inter-Agent Privilege Escalation +1. Compromise low-privilege agent with prompt injection +2. Lateral instruction passing to orchestrator +3. Orchestrator executes action outside original permission boundary +4. Expanded access leads to data exfiltration or sabotage + +**Controls:** +- Preventive: identity-bound inter-agent authz, least-privilege role boundaries +- Detective: cross-agent call graph anomaly detection +- Corrective: isolate compromised agent, revoke delegated capabilities + +--- + +## πŸ“ˆ AI Harm Severity and Triage Model + +Use CVSS as a base, then add AI-specific modifiers: + +| Dimension | Description | Scale | +|-----------|-------------|-------| +| **Exploitability** | How easy the issue is to reproduce | Low/Med/High | +| **User Impact** | Potential harm to users or protected groups | Low/Med/High/Critical | +| **Autonomy Factor** | Can agents execute actions without human confirmation? | None/Partial/Full | +| **Blast Radius** | Single user, tenant, or cross-tenant/system-wide | Narrow/Broad/Systemic | +| **Recoverability** | Time/effort to safely restore expected behavior | Easy/Moderate/Hard | + +### Triage SLA (Suggested) +- **Critical**: acknowledge immediately, mitigate within 24 hours +- **High**: acknowledge within 4 hours, mitigate within 7 days +- **Medium**: mitigate within 30 days +- **Low**: backlog with risk acceptance + review date + +--- + +## 🧩 Secure SDLC Integration Artifacts + +To reduce "one-off" testing, integrate red-team controls into delivery workflows. + +### PR Security Checklist (AI Systems) +- [ ] Threat model updated for new capabilities/tools +- [ ] New prompts/flows added to evaluation harness +- [ ] High-risk tool actions require explicit authorization checks +- [ ] Logging and privacy controls validated +- [ ] Residual risks documented in system card + +### Release Readiness Criteria +- No open Critical findings +- All High findings have approved mitigation or documented exception +- Regression suite passes for required attack categories +- Monitoring/detection rules deployed for new features + +### Operational Runbook Triggers +- Sudden ASR spike (>2x baseline) +- New jailbreak family with repeat success +- Evidence of cross-tenant leakage or autonomous unsafe tool use + + ## πŸ“‹ Regulatory Compliance ### United States diff --git a/templates/ai-security-pr-checklist.md b/templates/ai-security-pr-checklist.md new file mode 100644 index 0000000..bdf68ea --- /dev/null +++ b/templates/ai-security-pr-checklist.md @@ -0,0 +1,10 @@ +# AI Security PR Checklist + +- [ ] Threat model updated if behavior/capability changed +- [ ] New or modified prompts added to security regression suite +- [ ] Tool authorization boundary validated (least privilege) +- [ ] Prompt injection and jailbreak tests executed for changed flows +- [ ] Data handling reviewed for PII/log retention requirements +- [ ] Output filtering and policy checks validated +- [ ] Monitoring/detection rules updated for new failure modes +- [ ] Residual risks documented in model/system card diff --git a/templates/threat-modeling-workshop.md b/templates/threat-modeling-workshop.md new file mode 100644 index 0000000..7e80e90 --- /dev/null +++ b/templates/threat-modeling-workshop.md @@ -0,0 +1,34 @@ +# Threat Modeling Workshop Template (AI Systems) + +## Workshop Goals +- Identify highest-risk abuse paths for the AI system. +- Prioritize red-team test scenarios by business impact and exploitability. +- Assign owners and due dates for controls and mitigations. + +## Participants +- Product owner +- AI/ML engineer +- Security engineer / red team +- Detection / SOC representative +- Legal / compliance (for high-risk domains) + +## Pre-Read Checklist +- Architecture diagram and trust boundaries +- Data flow (inputs, retrieval, tools, outputs) +- List of model capabilities and enabled tools/actions +- Existing guardrails (input/output/content/policy) +- Known incidents or prior findings + +## 120-Minute Agenda +1. Scope and assumptions (15 min) +2. System walkthrough (20 min) +3. Threat brainstorming (30 min) +4. Risk scoring and prioritization (30 min) +5. Test plan and owners (20 min) +6. Wrap-up and next steps (5 min) + +## Output Artifacts +- Prioritized risk register +- Red-team test plan for next sprint +- Detection/monitoring gaps backlog +- Signed-off risk acceptance for deferred items