From 9625a9741950e25a0f350e6be19bce14a7b5ec86 Mon Sep 17 00:00:00 2001 From: abbousaad Date: Sun, 3 May 2026 10:01:03 +0100 Subject: [PATCH 1/3] docs: add navigation links for multi-agent appendix --- standard/Getting_Started.md | 1 + standard/README.md | 1 + 2 files changed, 2 insertions(+) diff --git a/standard/Getting_Started.md b/standard/Getting_Started.md index 9ce5f6c..ef365b5 100644 --- a/standard/Getting_Started.md +++ b/standard/Getting_Started.md @@ -70,6 +70,7 @@ Depending on your role: | [Customer Acceptance Testing](appendix/Customer_Acceptance_Testing.md) | Informative | Hands-on verification tests | | [Incident Response Integration](appendix/Incident_Response_Integration.md) | Informative | Operational integration guidance | | [Cross-Domain Integration](appendix/Cross_Domain_Integration.md) | Informative | Cross-domain trigger mappings and dependency analysis | +| [Multi-Agent Coordination](appendix/Multi_Agent_Coordination.md) | Informative | Guidance for multi-agent deployment coordination and review | | [Testing Phase Mapping](appendix/Testing_Phase_Mapping.md) | Informative | Requirements mapped to pentesting lifecycle phases | | [Evidence Request Checklist](appendix/Evidence_Request_Checklist.md) | Informative | Lightweight checklist of artifacts customers and reviewers can request | | [Human Review Record Template](appendix/Human_Review_Record_Template.md) | Informative | Illustrative template for documenting reviewer approval of critical findings | diff --git a/standard/README.md b/standard/README.md index 739ba26..118de67 100644 --- a/standard/README.md +++ b/standard/README.md @@ -29,6 +29,7 @@ The [appendices](./appendix/) provide cross-cutting resources: checklists, compl | [Checklists](./appendix/Checklists.md) | Per-tier compliance checklists | | [Compliance Matrix](./appendix/Compliance_Matrix.md) | Mappings to NIST CSF 2.0, ISO 27001:2022, NIST AI RMF, SOC 2, PCI DSS, GDPR | | [Cross-Domain Integration](./appendix/Cross_Domain_Integration.md) | How events in one domain trigger requirements in others | +| [Multi-Agent Coordination](./appendix/Multi_Agent_Coordination.md) | Guidance for multi-agent deployment coordination, safe defaults, and reviewer evidence | | [Testing Phase Mapping](./appendix/Testing_Phase_Mapping.md) | Which requirements apply at each pentesting phase | | [Customer Acceptance Testing](./appendix/Customer_Acceptance_Testing.md) | Optional hands-on verification procedures for customers | | [Vendor Evaluation Guide](./appendix/Vendor_Evaluation_Guide.md) | Guide for evaluating autonomous pentest platform operators | From f53735fa9f052af74e8096c9512dc912274cde76 Mon Sep 17 00:00:00 2001 From: abbousaad Date: Mon, 4 May 2026 11:56:39 +0100 Subject: [PATCH 2/3] appendix: add advisory guidance for agent safety gaps --- README.md | 2 +- index.md | 2 +- standard/2_Safety_Controls/README.md | 2 +- standard/6_Manipulation_Resistance/README.md | 2 + standard/Frontispiece.md | 2 +- standard/Getting_Started.md | 2 +- standard/Introduction.md | 2 +- standard/README.md | 2 +- standard/appendix/Advisory_Requirements.md | 41 +++++++++++++++++++- standard/appendix/Glossary.md | 2 +- standard/appendix/Vendor_Evaluation_Guide.md | 2 +- 11 files changed, 51 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 7524c66..708e9f2 100644 --- a/README.md +++ b/README.md @@ -34,7 +34,7 @@ APTS is not a testing methodology. It complements PTES, OWASP WSTG, and OSSTMM b - **Tier 2 (Verified)**: 85 additional (157 cumulative). Full transparency, tamper-proof audit trails, and independently verifiable findings. - **Tier 3 (Comprehensive)**: 16 additional (173 cumulative). Highest assurance for critical infrastructure and L4 autonomous operations. -Sixteen additional advisory practices live exclusively in the [Advisory Requirements appendix](./standard/appendix/Advisory_Requirements.md) under the `APTS--A0x` identifier pattern. Advisory practices are not counted toward any tier and do not affect conformance. +Eighteen additional advisory practices live exclusively in the [Advisory Requirements appendix](./standard/appendix/Advisory_Requirements.md) under the `APTS--A0x` identifier pattern. Advisory practices are not counted toward any tier and do not affect conformance. APTS has no certification body, no mandatory third-party audit, and no fee. Platforms are assessed against the requirements and conformance is documented. The standard does not prescribe who performs the assessment; internal self-assessment, independent internal review, and external third-party assessment are all valid approaches, and the choice is left to the reader. diff --git a/index.md b/index.md index 5176bab..ca8d883 100644 --- a/index.md +++ b/index.md @@ -45,7 +45,7 @@ APTS is not a testing methodology. It complements PTES, OWASP WSTG, and OSSTMM b - **Tier 2 (Verified)**: 85 additional (157 cumulative). Full transparency, tamper-proof audit trails, and independently verifiable findings. - **Tier 3 (Comprehensive)**: 16 additional (173 cumulative). Highest assurance for critical infrastructure and L4 autonomous operations. -Sixteen additional advisory practices live exclusively in the [Advisory Requirements appendix](./standard/appendix/Advisory_Requirements.md) under the `APTS--A0x` identifier pattern. Advisory practices are not counted toward any tier and do not affect conformance. +Eighteen additional advisory practices live exclusively in the [Advisory Requirements appendix](./standard/appendix/Advisory_Requirements.md) under the `APTS--A0x` identifier pattern. Advisory practices are not counted toward any tier and do not affect conformance. APTS has no certification body, no mandatory third-party audit, and no fee. Platforms are assessed against the requirements and conformance is documented. The standard does not prescribe who performs the assessment; internal self-assessment, independent internal review, and external third-party assessment are all valid approaches, and the choice is left to the reader. diff --git a/standard/2_Safety_Controls/README.md b/standard/2_Safety_Controls/README.md index ae0676c..6c65abb 100644 --- a/standard/2_Safety_Controls/README.md +++ b/standard/2_Safety_Controls/README.md @@ -52,7 +52,7 @@ The 20 requirements in this domain fall into seven thematic groups: A platform claims conformance with this domain by satisfying all MUST requirements at the compliance tier it targets. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 SC requirement plus every Tier 2 SC requirement, and a Tier 3 platform satisfies all three tiers. SHOULD-level requirements are interpreted per RFC 2119. -Two appendix-only advisory practices for this domain (APTS-SC-A01 Platform Health Monitoring and Anomaly Detection and APTS-SC-A02 Context Window Safety and Constraint Preservation) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. +Three appendix-only advisory practices for this domain (APTS-SC-A01 Platform Health Monitoring and Anomaly Detection, APTS-SC-A02 Context Window Safety and Constraint Preservation, and APTS-SC-A03 Tool Invocation Parameter and Chaining Governance) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. Every requirement in this domain includes a Verification subsection listing the verification procedures a reviewer uses to confirm implementation. diff --git a/standard/6_Manipulation_Resistance/README.md b/standard/6_Manipulation_Resistance/README.md index 76851e0..14c1044 100644 --- a/standard/6_Manipulation_Resistance/README.md +++ b/standard/6_Manipulation_Resistance/README.md @@ -62,6 +62,8 @@ The 23 requirements in this domain fall into seven thematic groups: A platform claims conformance with this domain by satisfying all MUST requirements at the compliance tier it targets. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 MR requirement plus every Tier 2 MR requirement, and a Tier 3 platform satisfies all three tiers. SHOULD-level requirements are interpreted per RFC 2119. +Three advisory practices relevant to this domain (APTS-MR-A01 Goal Misgeneralization and Emergent Misalignment Evaluation Suite, APTS-MR-A02 Sandbagging Detection and Behavioral Consistency Validation, and APTS-MR-A03 Multi-Turn Adversarial Conversation Resilience) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. + Every requirement in this domain includes a Verification subsection listing the verification procedures a reviewer uses to confirm implementation. --- diff --git a/standard/Frontispiece.md b/standard/Frontispiece.md index a4a3510..26c7cfa 100644 --- a/standard/Frontispiece.md +++ b/standard/Frontispiece.md @@ -72,4 +72,4 @@ Licensed under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/). | Version | Date | Notes | |---------|------|-------| -| 0.1.0 | April 2026 | Initial release. Eight domains, 173 tier-required requirements across three compliance tiers, plus 16 advisory practices in the appendix. | +| 0.1.0 | April 2026 | Initial release. Eight domains, 173 tier-required requirements across three compliance tiers, plus 18 advisory practices in the appendix. | diff --git a/standard/Getting_Started.md b/standard/Getting_Started.md index ef365b5..0f36497 100644 --- a/standard/Getting_Started.md +++ b/standard/Getting_Started.md @@ -87,7 +87,7 @@ Depending on your role: ## Common Questions **Q: Do I need to implement all 173 requirements?** -No. Start with Tier 1 (72 requirements). Tier 2 and Tier 3 add requirements progressively for cumulative totals of 157 and 173. An additional 16 advisory practices live in the [Advisory Requirements appendix](appendix/Advisory_Requirements.md) under the `APTS--A0x` identifier pattern; advisory practices are not required for conformance at any tier. See [Introduction: Compliance Tiers](Introduction.md#compliance-tiers) for details. +No. Start with Tier 1 (72 requirements). Tier 2 and Tier 3 add requirements progressively for cumulative totals of 157 and 173. An additional 18 advisory practices live in the [Advisory Requirements appendix](appendix/Advisory_Requirements.md) under the `APTS--A0x` identifier pattern; advisory practices are not required for conformance at any tier. See [Introduction: Compliance Tiers](Introduction.md#compliance-tiers) for details. **Q: What if my platform meets most but not all Tier 1 requirements?** APTS does not award partial credit. A platform must meet 100% of requirements for its claimed tier. Address gaps before claiming a tier. diff --git a/standard/Introduction.md b/standard/Introduction.md index aad47f1..a25d591 100644 --- a/standard/Introduction.md +++ b/standard/Introduction.md @@ -44,7 +44,7 @@ APTS does not prescribe who performs the assessment. The choice of internal self | 7 | Third-Party & Supply Chain Trust | TP | 22 | AI providers, cloud dependencies, data handling, foundation model disclosure | | 8 | Reporting | RP | 15 | Finding validation, confidence scoring, coverage disclosure | -**Total: 173 tier-required requirements** (Tier 1 + Tier 2 + Tier 3) across the eight domains. An additional **16 advisory practices** live exclusively in the [Advisory Requirements](appendix/Advisory_Requirements.md) appendix using the `APTS--A0x` identifier pattern; advisory practices are not counted toward any tier and do not affect conformance. +**Total: 173 tier-required requirements** (Tier 1 + Tier 2 + Tier 3) across the eight domains. An additional **18 advisory practices** live exclusively in the [Advisory Requirements](appendix/Advisory_Requirements.md) appendix using the `APTS--A0x` identifier pattern; advisory practices are not counted toward any tier and do not affect conformance. --- diff --git a/standard/README.md b/standard/README.md index 118de67..6a1614a 100644 --- a/standard/README.md +++ b/standard/README.md @@ -1,6 +1,6 @@ # OWASP Autonomous Penetration Testing Standard -This is the full OWASP Autonomous Penetration Testing Standard. It defines 173 tier-required requirements across 8 domains (plus 16 advisory practices in the [Advisory Requirements appendix](appendix/Advisory_Requirements.md)) that autonomous penetration testing platforms must meet to operate safely, transparently, and within defined boundaries, whether delivered by vendors, operated as a service, or built in-house by enterprise security teams. +This is the full OWASP Autonomous Penetration Testing Standard. It defines 173 tier-required requirements across 8 domains (plus 18 advisory practices in the [Advisory Requirements appendix](appendix/Advisory_Requirements.md)) that autonomous penetration testing platforms must meet to operate safely, transparently, and within defined boundaries, whether delivered by vendors, operated as a service, or built in-house by enterprise security teams. ## Getting Started diff --git a/standard/appendix/Advisory_Requirements.md b/standard/appendix/Advisory_Requirements.md index 428487d..0f5aaed 100644 --- a/standard/appendix/Advisory_Requirements.md +++ b/standard/appendix/Advisory_Requirements.md @@ -121,6 +121,46 @@ When the platform summarizes, truncates, or otherwise compacts the agent's conte --- +### APTS-SC-A03: Tool Invocation Parameter and Chaining Governance (Advisory) + +**Applicability:** This practice applies to platforms that let an agent or model-driven orchestration layer invoke allowlisted tools with runtime-supplied parameters. + +**Rationale:** APTS-SC-020 already does the important architectural work. The allowlist lives outside the model. Entries include allowed parameters or parameter bounds. That is the floor. The unresolved problem appears one layer lower, where a tool is nominally approved but the parameter mix changes what the tool can do in practice. A rate-control tool with a zero-delay setting can still stress a target. A browser connector may be approved for scoped validation, then become an egress path once paired with an export utility. Observations from agentic systems suggest that this is where "allowed" starts to blur into "unsafe." SC-020 defines the enforcement point. This advisory deals with what that enforcement layer should evaluate before dispatch. + +**Value:** Parameter-aware enforcement closes a gap that is easy to miss in design review and easy to trigger at runtime. Each call can look acceptable on its own. The sequence is the problem. Mature platforms catch that before the target does. + +**Practice Description:** + +For each allowlisted tool, document a parameter schema that covers types, bounds, enumerated values, and disallowed combinations for safety-critical fields. Those fields usually include target identifiers, rate limits, payload sizes, recursion depth, credential references, export destinations, authentication modes, and execution windows. Validate them in the external enforcement layer before dispatch, not inside the model prompt and not only in the tool wrapper. Then go one step further. Some requests are syntactically valid and still wrong in context. Check that the referenced target is still in scope, that the credential reference is authorized for that tool and that engagement, and that the requested rate or payload remains inside the approved operating envelope. + +Platforms operating with multi-step agents should also define a bounded set of disallowed tool chains. This is not an attempt to solve general action planning through brittle signatures. It is narrower than that. Block the sequences that are already understood to produce disallowed outcomes: reconnaissance that flows directly into unapproved credential use, widened enumeration that turns into unauthorized pivoting, or evidence export routed to a destination outside the approved delivery path. Log parameter rejections and chain-policy rejections with the tool identifier, attempted parameters, and the reason for denial. + +**Recommendation:** Start small. Cover the tools that can change scope, traffic intensity, credential usage, or data egress. Build a short catalog of unsafe parameter combinations and unsafe tool chains, run it through change control with the allowlist, and expand from operational evidence rather than taxonomy for its own sake. + +**Related normative requirements:** APTS-SC-004, APTS-SC-020, APTS-SE-006, APTS-SE-023, APTS-MR-023. + +--- + +### APTS-MR-A03: Multi-Turn Adversarial Conversation Resilience (Advisory) + +**Applicability:** This practice applies to platforms that use LLM-based or agentic runtimes with conversational state spanning multiple turns, tool calls, or planning steps. + +**Rationale:** APTS-MR-018 protects the architectural boundary between trusted instructions and untrusted target-derived data. APTS-MR-023 assumes the runtime may still misbehave and requires containment. Those controls remain necessary. They do not fully cover a pattern that shows up repeatedly in agentic systems: no single turn looks decisive, yet the accumulated dialogue erodes the runtime's refusal logic until it attempts something it should never have attempted. Split-message assembly does this. So do paraphrase drift, encoding chains, and target content that slowly reframes itself as operator intent. The distinction from MR-023 matters. MR-023 is containment after deviation. This advisory is about exercising the conversation path that leads to deviation in the first place. + +**Value:** Single-turn screening misses a surprising amount of bad behavior because the dangerous instruction is often distributed across harmless-looking fragments. Multi-turn testing exposes that drift. It gives the operator a better signal about whether refusal behavior, scope discipline, and tool-use restrictions survive a long session instead of only surviving a clean lab prompt. + +**Practice Description:** + +Maintain a compact corpus of multi-turn adversarial sequences that exercises cumulative manipulation patterns relevant to autonomous pentesting. Include staged jailbreak attempts, split-message assembly, paraphrase and synonym variation, encoding or representation shifts, and attempts to reframe untrusted target content as operator intent. Then compare outcomes. If the platform rejects a direct request to widen scope or export sensitive data, it should reject the obfuscated and distributed versions of the same request as well. + +Where the platform carries conversational or planning state across long sessions, treat that state as part of the attack surface. Segment or reset state at clear engagement and approval boundaries. Do not let adversarial target content sit in privileged planning context longer than necessary. When the platform summarizes or compacts history, verify that compaction does not turn a previously rejected path into an acceptable one by stripping away context or weakening the refusal signal. Log the test results and classify failures by what actually eroded: refusal drift, scope drift, tool-use drift, or unsafe state carryover. + +**Recommendation:** Start with a small recurring corpus focused on the failures that matter operationally: scope override, instruction bypass, credential exfiltration attempts, and unsafe tool requests distributed across several turns. Use the results to tune guardrails and session-boundary handling. A benchmark score by itself is not the point. + +**Related normative requirements:** APTS-MR-001, APTS-MR-018, APTS-MR-023, APTS-SC-020. + +--- + ### APTS-HO-A01: Out-of-Band Kill Switch via Independent Network (Advisory) **Rationale:** Core kill switch functionality is covered by APTS-HO-008. The requirement for kill switch activation via physically independent communication channels (cellular, management network, physical button) assumes deployment scenarios where the primary network may be compromised or unavailable. @@ -293,4 +333,3 @@ Implement an automated verification mechanism that screens each reported finding | Tier 3 (Comprehensive) | Maximum assurance for high-risk operations | Recommended where operationally feasible | Advisory practices are independent of the tier system. A platform may claim Tier 3 conformance without implementing any advisory practices, and a Tier 2 platform may implement advisory practices that are relevant to its deployment context. - diff --git a/standard/appendix/Glossary.md b/standard/appendix/Glossary.md index e87cde5..06e27b2 100644 --- a/standard/appendix/Glossary.md +++ b/standard/appendix/Glossary.md @@ -79,7 +79,7 @@ Notation for specifying IP address ranges using a base address and prefix length Alternative security measures that mitigate vulnerability when the primary control is missing. Example: Two-factor authentication compensates for weak passwords. **Compliance Tier** -One of three progressive levels of APTS conformance. Tier 1 (Foundation) requires 72 core requirements (MUST | Tier 1). Tier 2 (Verified) adds 85 requirements for a cumulative 157 (MUST | Tier 2 + SHOULD | Tier 2). Tier 3 (Comprehensive) adds 16 requirements for a cumulative 173 (MUST | Tier 3 + SHOULD | Tier 3). A platform must meet 100% of requirements assigned to its claimed tier (both MUST and SHOULD). An additional 16 advisory practices in the Advisory Requirements appendix are recommended for highest-assurance engagements but are not counted toward any tier. +One of three progressive levels of APTS conformance. Tier 1 (Foundation) requires 72 core requirements (MUST | Tier 1). Tier 2 (Verified) adds 85 requirements for a cumulative 157 (MUST | Tier 2 + SHOULD | Tier 2). Tier 3 (Comprehensive) adds 16 requirements for a cumulative 173 (MUST | Tier 3 + SHOULD | Tier 3). A platform must meet 100% of requirements assigned to its claimed tier (both MUST and SHOULD). An additional 18 advisory practices in the Advisory Requirements appendix are recommended for highest-assurance engagements but are not counted toward any tier. **Confidence Score** A numeric value on a 0-100% scale indicating the platform's certainty in a scope boundary determination, target legitimacy assessment, asset classification, or finding validity. Scores below 75% for scope-related decisions trigger mandatory human escalation. See APTS-HO-013, APTS-RP-003. diff --git a/standard/appendix/Vendor_Evaluation_Guide.md b/standard/appendix/Vendor_Evaluation_Guide.md index f85cae4..5da39e6 100644 --- a/standard/appendix/Vendor_Evaluation_Guide.md +++ b/standard/appendix/Vendor_Evaluation_Guide.md @@ -14,7 +14,7 @@ Decide your minimum compliance tier based on your risk tolerance: - **Tier 2 (Verified):** 157 cumulative requirements (72 + 85). The platform is fully transparent about what it did and why, protects your data with tamper-proof audit trails, handles incidents with formal response procedures, and provides independently verifiable findings. **Choose Tier 2 when:** you are testing production environments, operating in regulated industries, or need full accountability for audit or compliance purposes. This is the recommended minimum for most production deployments. -- **Tier 3 (Comprehensive):** 173 cumulative requirements (157 + 16). The platform meets the highest assurance bar for critical infrastructure, fully autonomous (L4) operations, and the strictest regulatory requirements. **Choose Tier 3 when:** you are deploying fully autonomous testing against critical infrastructure, financial systems, or healthcare environments with minimal human oversight. An additional 16 advisory practices in the [Advisory Requirements appendix](Advisory_Requirements.md) are recommended for highest-assurance engagements but are not counted toward any tier. +- **Tier 3 (Comprehensive):** 173 cumulative requirements (157 + 16). The platform meets the highest assurance bar for critical infrastructure, fully autonomous (L4) operations, and the strictest regulatory requirements. **Choose Tier 3 when:** you are deploying fully autonomous testing against critical infrastructure, financial systems, or healthcare environments with minimal human oversight. An additional 18 advisory practices in the [Advisory Requirements appendix](Advisory_Requirements.md) are recommended for highest-assurance engagements but are not counted toward any tier. > **Minimum tier guidance:** Tier 1 is appropriate for supervised testing of non-critical systems in non-regulated environments. Organizations in financial services, healthcare, critical infrastructure, or any regulated industry SHOULD require Tier 2 as a minimum. Tier 3 is recommended for critical infrastructure, fully autonomous (L4) operations, and environments with the strictest regulatory requirements. From 121c54e239d456e9f5ac64b892f5c05e71b3089f Mon Sep 17 00:00:00 2001 From: abbousaad Date: Tue, 5 May 2026 09:17:51 +0100 Subject: [PATCH 3/3] docs: remove unrelated appendix links from issue 35 PR --- standard/Getting_Started.md | 1 - standard/README.md | 1 - 2 files changed, 2 deletions(-) diff --git a/standard/Getting_Started.md b/standard/Getting_Started.md index 0f36497..259e0ba 100644 --- a/standard/Getting_Started.md +++ b/standard/Getting_Started.md @@ -70,7 +70,6 @@ Depending on your role: | [Customer Acceptance Testing](appendix/Customer_Acceptance_Testing.md) | Informative | Hands-on verification tests | | [Incident Response Integration](appendix/Incident_Response_Integration.md) | Informative | Operational integration guidance | | [Cross-Domain Integration](appendix/Cross_Domain_Integration.md) | Informative | Cross-domain trigger mappings and dependency analysis | -| [Multi-Agent Coordination](appendix/Multi_Agent_Coordination.md) | Informative | Guidance for multi-agent deployment coordination and review | | [Testing Phase Mapping](appendix/Testing_Phase_Mapping.md) | Informative | Requirements mapped to pentesting lifecycle phases | | [Evidence Request Checklist](appendix/Evidence_Request_Checklist.md) | Informative | Lightweight checklist of artifacts customers and reviewers can request | | [Human Review Record Template](appendix/Human_Review_Record_Template.md) | Informative | Illustrative template for documenting reviewer approval of critical findings | diff --git a/standard/README.md b/standard/README.md index 6a1614a..96d8025 100644 --- a/standard/README.md +++ b/standard/README.md @@ -29,7 +29,6 @@ The [appendices](./appendix/) provide cross-cutting resources: checklists, compl | [Checklists](./appendix/Checklists.md) | Per-tier compliance checklists | | [Compliance Matrix](./appendix/Compliance_Matrix.md) | Mappings to NIST CSF 2.0, ISO 27001:2022, NIST AI RMF, SOC 2, PCI DSS, GDPR | | [Cross-Domain Integration](./appendix/Cross_Domain_Integration.md) | How events in one domain trigger requirements in others | -| [Multi-Agent Coordination](./appendix/Multi_Agent_Coordination.md) | Guidance for multi-agent deployment coordination, safe defaults, and reviewer evidence | | [Testing Phase Mapping](./appendix/Testing_Phase_Mapping.md) | Which requirements apply at each pentesting phase | | [Customer Acceptance Testing](./appendix/Customer_Acceptance_Testing.md) | Optional hands-on verification procedures for customers | | [Vendor Evaluation Guide](./appendix/Vendor_Evaluation_Guide.md) | Guide for evaluating autonomous pentest platform operators |