diff --git a/docs/specifications/dcm-admin-api-spec.md b/docs/specifications/dcm-admin-api-spec.md new file mode 100644 index 0000000..db0b9cf --- /dev/null +++ b/docs/specifications/dcm-admin-api-spec.md @@ -0,0 +1,984 @@ +# DCM Admin API Specification + +**Document Status:** πŸ“‹ Draft β€” Ready for Implementation Feedback +**Document Type:** API Narrative Specification + + +> **πŸ“‹ Draft** +> +> This specification has been promoted from Work in Progress to Draft status. Complete Admin API covering all platform admin operations with request/response examples. It is ready for implementation feedback but has not yet been formally reviewed for final release. +> +> This specification defines the DCM Admin API β€” the platform administration interface. Published to share design direction and invite feedback. Do not build production integrations against this specification until it reaches draft status. + +**Version:** 0.1.0-draft +**Status:** Draft β€” Ready for implementation feedback +**Document Type:** Technical Specification +**Related Documents:** [Foundational Abstractions](https://github.com/croadfeldt/udlm/blob/main/foundations/foundations.md) | [Consumer API Specification](consumer-api-spec.md) | [DCM Operator Interface Specification](dcm-operator-interface-spec.md) | [Control Plane Components](../../architecture/control-plane/components.md) | [Accreditation and Authorization Matrix](https://github.com/croadfeldt/udlm/blob/main/governance/accreditation-and-authorization-matrix.md) + +--- + +## Abstract + +The Admin API is the platform administration interface for DCM. It is served through the same Ingress API as the Consumer API and Provider API but is restricted to actors with `platform_admin` or `tenant_admin` roles. It covers operations that consumers cannot perform β€” Tenant lifecycle management, provider registration review, accreditation approval, quota administration, discovery management, orphan resolution, recovery decision escalation, and bootstrap operations. + +--- + + +> **AEP Alignment:** This specification follows [AEP](https://aep.dev) conventions. +> Custom methods use colon syntax (`POST /admin/providers/{uuid}:approve`). +> Async operations return an `Operation` resource (AEP-136 LRO). +> List pagination uses `page_size` and `page_token` parameters. +> See the normative OpenAPI specification: `schemas/openapi/dcm-admin-api.yaml` + +## 1. Authentication and Authorization + +All Admin API endpoints require Bearer token authentication (same as Consumer API). Role requirements are declared per endpoint: + +| Role | Scope | +|------|-------| +| `platform_admin` | All Admin API operations across all Tenants | +| `tenant_admin` | Tenant-scoped Admin API operations for their own Tenant only | + +Base URL: `/api/v1/admin/` + +> **Versioning:** See [API Versioning Strategy](../../architecture/control-plane/api-versioning.md). Breaking changes increment the major version. The Admin API follows the same deprecation lifecycle as the Consumer API, with profile-governed support windows. + +Step-up MFA is required for destructive operations (Tenant decommission, accreditation revocation, bootstrap credential rotation) regardless of session MFA status. + +--- + +### 1.1 Rate Limiting + +Admin API endpoints have separate rate limits from the Consumer API, applied per authenticated admin actor: + +| Profile | Requests/minute | Burst | +|---------|----------------|-------| +| All profiles | 120 | 40 | + +Rate-limited responses include `Retry-After`, `X-RateLimit-Limit`, `X-RateLimit-Remaining` headers. + +### 1.2 Request and Correlation IDs + +All responses include `X-DCM-Request-ID` and `X-DCM-Correlation-ID` headers (same model as Consumer API). + +### 1.3 Response Envelopes + +List responses use `{"items": [...], "total": N, "next_cursor": "..."}`. Single resources returned directly. Errors use `{"error": "...", "message": "...", "request_id": "..."}`. + +--- + +## 2. Tenant Management + +### 2.1 List Tenants + +``` +GET /api/v1/admin/tenants +Role: platform_admin + +Query params: status=, page, page_size + +Response 200: +{ + "tenants": [ + { + "tenant_uuid": "", + "handle": "payments-team", + "display_name": "Payments Platform", + "status": "active", + "deployment_posture": "prod", + "compliance_domains": ["hipaa"], + "recovery_profile": "notify-and-wait", + "entity_count": 142, + "created_at": "" + } + ], + "total": 12 +} +``` + +### 2.2 Create Tenant + +``` +POST /api/v1/admin/tenants +Role: platform_admin + +{ + "handle": "new-team", + "display_name": "New Team", + "deployment_posture": "standard", + "compliance_domains": [], + "recovery_profile_override": null, + "initial_admin_actor_uuid": "" +} + +Response 201 Created: +{ + "tenant_uuid": "", + "status": "active" +} +``` + +### 2.3 Suspend / Reinstate Tenant + +``` +POST /api/v1/admin/tenants/{tenant_uuid}:suspend +POST /api/v1/admin/tenants/{tenant_uuid}:reinstate +Role: platform_admin + +{ + "reason": "", + "notify_tenant_admin": true +} +``` + +### 2.4 Decommission Tenant + +``` +DELETE /api/v1/admin/tenants/{tenant_uuid} +Role: platform_admin +Requires: step-up MFA + +{ + "reason": "", + "force": false, # true: decommission even if active entities remain + "notify_tenant_admin": true +} + +Response 409 Conflict (if active entities and force=false): +{ + "error": "tenant_has_active_entities", + "active_entity_count": 47, + "resolution": "Decommission all entities first, or use force=true" +} +``` + +--- + +## 3. Provider Management + +### 3.1 List Registered Providers + +``` +GET /api/v1/admin/providers +Role: platform_admin + +Query params: type=, status= + +Response 200: +{ + "providers": [ + { + "provider_uuid": "", + "handle": "eu-west-prod-1", + "provider_type": "service", + "status": "active", + "health": "healthy", + "accreditation_count": 2, + "max_data_classification": "phi" + } + ] +} +``` + +### 3.2 Review Provider Registration + +New provider registrations in `proposed` status require platform admin review: + +``` +GET /api/v1/admin/providers/pending +Role: platform_admin + +POST /api/v1/admin/providers/{provider_uuid}:approve +POST /api/v1/admin/providers/{provider_uuid}:reject +{ + "reason": "" +} +``` + +### 3.3 Suspend Provider + +``` +POST /api/v1/admin/providers/{provider_uuid}:suspend +Role: platform_admin + +{ + "reason": "", + "affect_existing_entities": "notify_only | block_new_requests | migrate" +} +``` + +--- + +## 4. Accreditation Management + +### 4.1 List Accreditations + +``` +GET /api/v1/admin/accreditations +Role: platform_admin + +Query params: subject_type, framework, status= + +Response 200: +{ + "accreditations": [ + { + "accreditation_uuid": "", + "subject_uuid": "", + "subject_type": "service_provider", + "framework": "hipaa", + "accreditation_type": "baa", + "status": "active", + "expires_at": "", + "days_until_expiry": 89 + } + ] +} +``` + +### 4.2 Approve Accreditation + +``` +POST /api/v1/admin/accreditations/{accreditation_uuid}:approve +Role: platform_admin +Requires: step-up MFA + +{ + "review_notes": "", + "certificate_verified": true +} +``` + +### 4.3 Revoke Accreditation + +``` +DELETE /api/v1/admin/accreditations/{accreditation_uuid} +Role: platform_admin +Requires: step-up MFA + +{ + "revocation_reason": "", + "affected_entity_action": "notify_only | block_new_requests | migrate_entities" +} +``` + +--- + +## 5. Discovery Management + +### 5.1 Trigger Discovery + +``` +POST /api/v1/admin/discovery:trigger +Role: platform_admin | tenant_admin + +{ + "scope": "entity | resource_type | provider | tenant", + "entity_uuid": "", + "resource_type": "Compute.VirtualMachine", + "provider_uuid": "", + "tenant_uuid": "", + "reason": "incident investigation", + "priority": "high | standard | background" +} + +Response 202 Accepted: +{ + "discovery_job_uuid": "", + "status": "queued", + "priority": "high", + "estimated_start": "" +} +``` + +### 5.2 Discovery Job Status + +``` +GET /api/v1/admin/discovery/jobs/{discovery_job_uuid} + +Response 200: +{ + "discovery_job_uuid": "", + "status": "running | completed | failed", + "entities_discovered": 47, + "new_entities_found": 2, + "started_at": "", + "completed_at": "", + "orphan_candidates_found": 1 +} +``` + +--- + +## 6. Orphan Management + +### 6.1 List Orphan Candidates + +``` +GET /api/v1/admin/orphans +Role: platform_admin + +Query params: provider_uuid, status= + +Response 200: +{ + "orphan_candidates": [ + { + "orphan_candidate_uuid": "", + "provider_uuid": "", + "provider_entity_id": "vm-0a1b2c3d", + "suspected_request_uuid": "", + "resource_type": "Compute.VirtualMachine", + "discovered_at": "", + "status": "under_review" + } + ] +} +``` + +### 6.2 Resolve Orphan Candidate + +``` +POST /api/v1/admin/orphans/{orphan_candidate_uuid}/resolve +Role: platform_admin + +{ + "resolution": "manual_decommission | adopt_into_dcm | mark_false_positive", + "reason": "", + "target_tenant_uuid": "" # required if resolution=adopt_into_dcm +} +``` + +--- + +## 7. Recovery Decision Management + +Platform admins can resolve pending recovery decisions for any entity: + +``` +GET /api/v1/admin/recovery-decisions/pending +Role: platform_admin + +Response 200: +{ + "pending_decisions": [ + { + "recovery_decision_uuid": "", + "entity_uuid": "", + "trigger": "DISPATCH_TIMEOUT", + "entity_state": "TIMEOUT_PENDING", + "deadline": "", + "tenant_uuid": "" + } + ] +} + +POST /api/v1/admin/recovery-decisions/{recovery_decision_uuid} +Role: platform_admin + +{ + "action": "DRIFT_RECONCILE | DISCARD_AND_REQUEUE | DISCARD_NO_REQUEUE", + "reason": "" +} +``` + +--- + +## 8. Quota Management + +### 8.1 View Tenant Quotas + +``` +GET /api/v1/admin/tenants/{tenant_uuid}/quotas +Role: platform_admin | tenant_admin + +Response 200: +{ + "quotas": [ + { + "resource_type": "Compute.VirtualMachine", + "limit": 100, + "current_usage": 47, + "policy_uuid": "" + } + ] +} +``` + +### 8.2 Update Quota + +``` +PUT /api/v1/admin/tenants/{tenant_uuid}/quotas/{resource_type} +Role: platform_admin + +{ + "new_limit": 150, + "reason": "Q2 capacity increase approved by FinOps" +} +``` + +--- + +## 9. Search Index Management + +``` +POST /api/v1/admin/search-index:rebuild +Role: platform_admin + +{ + "scope": "full | tenant | resource_type", + "tenant_uuid": "", + "reason": "Recovery after index corruption" +} + +Response 202 Accepted: +{ + "rebuild_job_uuid": "", + "estimated_duration": "PT2H", + "degraded_during_rebuild": true +} + +GET /api/v1/admin/search-index/status + +Response 200: +{ + "status": "healthy | degraded | rebuilding | unavailable", + "staleness_seconds": 42, + "last_full_rebuild": "", + "entity_count": 8421 +} +``` + +--- + +## 10. Bootstrap Operations + +### 10.1 Rotate Bootstrap Admin Credential + +``` +POST /api/v1/admin/bootstrap:rotate-credential +Role: platform_admin +Requires: step-up MFA (hardware_token_mfa for fsi/sovereign) + +{ + "new_credential_ref": "", + "reason": "Initial bootstrap credential rotation" +} +``` + +### 10.2 Deployment Health + +``` +GET /api/v1/admin/health + +Response 200: +{ + "overall": "healthy | degraded | critical", + "components": [ + { "component": "request_orchestrator", "status": "healthy" }, + { "component": "policy_engine", "status": "healthy" }, + { "component": "placement_engine", "status": "healthy" }, + { "component": "lifecycle_constraint_enforcer", "status": "healthy" }, + { "component": "discovery_scheduler", "status": "healthy" }, + { "component": "notification_router", "status": "healthy" }, + { "component": "cost_analysis", "status": "healthy" }, + { "component": "search_index", "status": "degraded", "staleness_seconds": 180 }, + { "component": "intent_store", "status": "healthy" }, + { "component": "requested_store", "status": "healthy" }, + { "component": "realized_store", "status": "healthy" } + ], + "active_profile": { + "deployment_posture": "prod", + "compliance_domains": ["hipaa"], + "recovery_posture": "notify-and-wait", + "zero_trust_posture": "full" + } +} +``` + +--- + +## 13. DCM Self-Health Endpoints + +DCM exposes three health endpoints, each with a distinct purpose: + +```http +# Liveness β€” is the process alive? (Kubernetes liveness probe) +GET /livez +# No auth required. Max response time: PT5S. +# Returns 200 OK with {"status":"ok"} if alive. +# Returns 503 if process is deadlocked or unresponsive. + +# Readiness β€” is DCM ready to serve traffic? (Kubernetes readiness probe) +GET /readyz +# No auth required. Max response time: PT10S. +# Returns 200 OK with {"status":"ready"} if all required stores are reachable. +# Returns 503 with {"status":"not_ready","reasons":["store_unreachable"]} otherwise. + +# Operational health β€” rich health for operators and monitoring systems +GET /api/v1/admin/health +Authorization: Bearer + +Response 200: +{ + "dcm_version": "", + "profile": "prod", + "status": "healthy", // healthy | degraded | critical + "components": { + "request_orchestrator": { "status": "healthy" }, + "policy_engine": { "status": "healthy" }, + "placement_engine": { "status": "healthy" }, + "service_provider": { "status": "degraded", "reason": "rotation_pending" } + }, + "stores": { + "intent_store": { "status": "healthy", "latency_p99_ms": 12 }, + "requested_store": { "status": "healthy", "latency_p99_ms": 8 }, + "realized_store": { "status": "healthy", "latency_p99_ms": 9 } + }, + "providers": { + "total": 4, + "healthy": 3, + "degraded": 1, + "unhealthy": 0 + } +} + +# Prometheus metrics +GET /metrics +# Unauthenticated (secured by network policy in production). +# Returns Prometheus text format metrics. +``` + +> **Full model:** See [DCM Self-Health](../../architecture/control-plane/self-health.md) β€” HLT-001–HLT-006. + + +## 12. Session Management (Admin) + +Platform admins can force-revoke sessions for any actor β€” used on actor compromise, policy violation, or deprovisioning. + +```http +# Force-revoke all sessions for an actor +POST /api/v1/admin/actors/{actor_uuid}:revoke-sessions +Authorization: Bearer + +{ + "reason": "security_event", // REQUIRED + "notify_actor": true // send notification event +} + +Response 202 Accepted: +{ + "sessions_revoked": 3, + "actor_uuid": "", + "revocation_propagated_at": "" +} +``` + +```http +# List active sessions for any actor (admin view) +GET /api/v1/admin/actors/{actor_uuid}/sessions +Authorization: Bearer + +Response 200: +{ + "items": [ + { + "session_uuid": "", + "created_at": "", + "expires_at": "", + "auth_method": "ldap", + "mfa_verified": true, + "status": "active" + } + ], + "total": 1 +} +``` + +**Error codes specific to session management:** + +| Error Code | HTTP | When | +|-----------|------|------| +| `actor_not_found` | 404 | Actor UUID not found | +| `no_active_sessions` | 404 | Actor has no active sessions | + +> **Full model:** See [Session Token Revocation](../../architecture/control-plane/session-revocation.md) β€” AUTH-016–AUTH-022. + + +## 11. Error Model + +All Admin API errors use the same envelope as the Consumer API: + +```json +{ + "error": "", // machine-readable snake_case code + "message": "", // human-readable description + "request_id": "", // matches X-DCM-Request-ID header + "details": {} // optional: field-level details +} +``` + +**Admin-specific error codes:** + +| Error Code | HTTP Status | When | +|-----------|-------------|------| +| `insufficient_admin_role` | 403 | Actor lacks required admin role | +| `tenant_not_found` | 404 | Tenant UUID not found | +| `provider_not_found` | 404 | Provider UUID not found | +| `approval_already_voted` | 409 | Actor has already voted on this approval | +| `approval_window_expired` | 410 | Approval window has passed | +| `degradation_already_accepted` | 409 | Degradation item already accepted | +| `tier_registry_blocked` | 409 | Registry change has unresolved blocking items | +| `quota_below_current_usage` | 422 | New quota would be below current consumption | + +All error responses include `X-DCM-Request-ID` and `X-DCM-Correlation-ID` headers. + + +## Scoring Model Administration + +> Approval routing thresholds use named-tier dynamic format. See [Authority Tier Model](https://github.com/croadfeldt/udlm/blob/main/governance/authority-tier-model.md) for the complete specification. + +### Get Scoring Thresholds for Profile + +``` +GET /api/v1/admin/profiles/{profile_name}/scoring + +Response 200: +{ + "profile": "standard", + "scoring_thresholds": { + "auto_approve_below": 25, + "approval_routing": [ + { "tier": "reviewed", "max_score": 59 }, + { "tier": "verified", "max_score": 79 }, + { "tier": "authorized", "max_score": 100 } + ] + }, + "signal_weights": { + "operational_gatekeeper": 0.45, + "completeness": 0.15, + "actor_risk_history": 0.20, + "quota_pressure": 0.10, + "provider_risk": 0.10 + }, + "policy_enforcement_overrides": [] +} +``` + +### Update Scoring Thresholds + +``` +PATCH /api/v1/admin/profiles/{profile_name}/scoring +{ + "scoring_thresholds": { + "auto_approve_below": 20, + "approval_routing": [ + { "tier": "reviewed", "max_score": 59 }, + { "tier": "verified", "max_score": 79 }, + { "tier": "authorized", "max_score": 100 } + ] + } +} + +Response 200: { "profile": "standard", "updated_at": "", "effective_immediately": true } +Response 422: { "error": "threshold_invalid", "reason": "auto_approve_below exceeds maximum of 50 (SMX-008)" } +``` + +### Add Policy Enforcement Override + +``` +POST /api/v1/admin/profiles/{profile_name}/scoring/overrides +{ + "policy_handle": "platform/gatekeeper/cpu-size-limit", + "override_enforcement_class": "compliance", + "rationale": "Prod profile: CPU limit is a hard constraint", + "applies_to_resource_types": ["Compute.VirtualMachine"] +} + +Response 201 Created: +{ "override_uuid": "", "policy_handle": "...", "effective_immediately": true } +``` + +### Actor Risk History + +``` +GET /api/v1/admin/actors/{actor_uuid}/risk-history + +Response 200: +{ + "actor_uuid": "", + "current_score": 30, + "events": [ + { + "event_type": "validation_failure", + "occurred_at": "", + "request_uuid": "", + "base_contribution": 5, + "decayed_contribution": 3.2, + "days_ago": 4 + } + ], + "decay_lambda": 0.1, + "score_half_life_days": 7 +} + +POST /api/v1/admin/actors/{actor_uuid}/risk-history:reset +{ + "reason": "Actor confirmed as trusted automation account", + "audit_note": "Reviewed and approved by platform admin" +} +``` + +### Score Audit Trail + +``` +GET /api/v1/admin/scoring/audit + +Query parameters: + from= + to= + routing_decision= + risk_score_above= + actor_uuid= + resource_type= + +Response 200: +{ + "score_records": [ + { + "score_record_uuid": "", + "request_uuid": "", + "risk_score": 47, + "routing_decision": "reviewed", + "signal_breakdown": { ... }, + "evaluated_at": "" + } + ] +} +``` + + +--- + +## Approval Management + +DCM provides approval gates for requests, policy contributions, provider registrations, and federation contributions. The Admin API is the integration point for recording decisions β€” it is designed to be called by both human reviewers in the DCM UI and by external systems (ServiceNow, Jira, Slack bots, workflow automation). + +### List Pending Approvals + +``` +GET /api/v1/admin/approvals/pending + +Query parameters: + approval_type= + tier= + reviewer_uuid= # approvals where this actor is an eligible reviewer + +Response 200: +{ + "pending_approvals": [ + { + "approval_uuid": "", + "approval_type": "policy_contribution", + "tier": "authorized", + "subject_uuid": "", + "subject_handle": "tenant/payments/gatekeeper/cost-ceiling", + "required_dcmgroup_uuid": "", # for authorized tier + "quorum_required": 3, + "votes_recorded": 1, + "submitted_at": "", + "window_expires_at": "", + "submitted_by": { "uuid": "", "display_name": "Bob Smith" } + } + ] +} +``` + +### Record an Approval Decision + +``` +POST /api/v1/admin/approvals/{approval_uuid}:vote + +{ + "decision": "approve | reject", + "reason": "", + "recorded_via": "dcm_admin_ui | servicenow | jira | slack_bot | api_direct | other", + "external_reference": "" +} + +Response 200: +{ + "approval_uuid": "", + "voter_uuid": "", + "decision": "approve", + "votes_recorded": 2, + "quorum_required": 3, + "quorum_reached": false, + "pipeline_status": "pending_authorized" +} + +# When quorum is reached or reviewed/verified satisfied: +{ + "approval_uuid": "", + "voter_uuid": "", + "decision": "approve", + "votes_recorded": 3, + "quorum_required": 3, + "quorum_reached": true, + "pipeline_status": "activating" +} + +Response 403: actor is not a member of the required authority group (authorized tier) or not in reviewer role +Response 409: actor has already voted on this approval (verified and authorized tiers enforce distinct voters) +Response 410: approval window has expired +``` + +### Get Approval Detail + +``` +GET /api/v1/admin/approvals/{approval_uuid} + +Response 200: +{ + "approval_uuid": "", + "approval_type": "authorized", + "subject_uuid": "", + "tier": "authorized", + "required_dcmgroup_uuid": "", + "quorum_required": 3, + "window_expires_at": "", + "votes": [ + { + "voter_uuid": "", + "voter_display_name": "Alice Chen", + "decision": "approve", + "recorded_at": "", + "recorded_via": "servicenow", + "external_reference": "CHG0012345" + } + ], + "status": "pending_authorized", + "quorum_reached": false +} +``` + + +--- + +## Authority Tier Registry Management + +> **Implementation note:** The tier registry change impact detection pipeline is specified in [Authority Tier Model](https://github.com/croadfeldt/udlm/blob/main/governance/authority-tier-model.md) Section 7. The endpoints below are the Admin API surface for proposing, reviewing, and activating tier registry changes. The detection mechanism (tier impact diff computation, affected item query, degradation gate) is an implementation responsibility. + +### Propose a Tier Registry Change + +``` +POST /api/v1/admin/tier-registry/changes + +{ + "proposed_tiers": [ + { "name": "auto", "insert_after": null, "decision_gravity": "none" }, + { "name": "reviewed", "insert_after": "auto", "decision_gravity": "routine" }, + { "name": "verified", "insert_after": "reviewed", "decision_gravity": "elevated" }, + { "name": "compliance_reviewed", "insert_after": "verified", "decision_gravity": "elevated" }, + { "name": "authorized", "insert_after": "compliance_reviewed", "decision_gravity": "critical" } + ], + "reason": "Adding compliance_reviewed tier for PCI-DSS regulated actions" +} + +Response 202 Accepted: +{ + "registry_change_uuid": "", + "status": "impact_assessment_pending", + "estimated_ready_at": "" +} +``` + +### Get Tier Registry Impact Report + +``` +GET /api/v1/admin/tier-registry/changes/{change_uuid}/impact + +Response 200: +{ + "registry_change_uuid": "", + "status": "impact_assessed | pending_degradation_review | ready_to_activate | blocked", + "summary": { + "degradations": 0, + "upgrades": 3, + "new_tiers": 1, + "broken_references": 0, + "profile_gaps": 2 + }, + "degradations": [], + "upgrades": [ ... ], + "profile_gaps": [ + { + "profile": "standard", + "missing_tiers": ["compliance_reviewed"], + "gap_effect": "Requests scoring in the compliance_reviewed range will route to verified tier until threshold list is updated" + } + ], + "blocking_items": [] +} +``` + +### Accept a Security Degradation + +``` +POST /api/v1/admin/tier-registry/changes/{change_uuid}:accept-degradation + +{ + "affected_item_uuid": "", + "affected_item_type": "provider_registration_requirement", + "acceptance_reason": "", + "accepted_by": "" +} + +Response 200: +{ + "acceptance_uuid": "", + "degradation_accepted": true, + "remaining_degradations": 0, + "change_status": "ready_to_activate" +} + +Response 403: actor does not hold verified or authorized tier reviewer role +Response 409: degradation already accepted +``` + +### Activate a Tier Registry Change + +``` +POST /api/v1/admin/tier-registry/changes/{change_uuid}:activate + +Response 200: +{ + "registry_change_uuid": "", + "activated_at": "", + "new_registry_version": "1.1.0", + "impact_report_uuid": "" +} + +Response 409: change has unresolved blocking items (broken_references or unaccepted degradations) +``` + +### List Historical Registry Changes + +``` +GET /api/v1/admin/tier-registry/changes?status=activated&page_size=20 + +Response 200: +{ + "changes": [ + { + "registry_change_uuid": "", + "status": "activated", + "activated_at": "", + "proposed_by": { "uuid": "", "display_name": "Alice Chen" }, + "summary": { "degradations": 0, "upgrades": 2, "new_tiers": 1 }, + "impact_report_uuid": "" + } + ] +} +``` + diff --git a/docs/specifications/dcm-registration-spec.md b/docs/specifications/dcm-registration-spec.md new file mode 100644 index 0000000..e69c029 --- /dev/null +++ b/docs/specifications/dcm-registration-spec.md @@ -0,0 +1,906 @@ +# DCM Registration Specification + +**Document Status:** πŸ“‹ Draft β€” Ready for Implementation Feedback +**Document Type:** Registration Specification + + +> **AEP Alignment:** Registration API endpoints follow [AEP](https://aep.dev) conventions β€” custom methods use colon syntax (`POST /admin/registrations/{uuid}:approve`). `resource_type` in provider capabilities accepts FQN string or Registry UUID. See `schemas/openapi/dcm-admin-api.yaml` for the normative specification. + + +> **πŸ“‹ Draft** +> +> This specification has been promoted from Work in Progress to Draft status. All questions resolved. Complete registration pipeline for all 11 provider types with full capability declaration schemas and federation trust model. It is ready for implementation feedback but has not yet been formally reviewed for final release. +> +> This specification defines the unified registration flow for all DCM provider types. Published to share design direction and invite feedback. + +**Version:** 0.1.0-draft +**Status:** Draft β€” Ready for implementation feedback +**Document Type:** Technical Specification +**Related Documents:** [Foundational Abstractions](https://github.com/croadfeldt/udlm/blob/main/foundations/foundations.md) | [Control Plane Components](../../architecture/control-plane/components.md) | [Governance Matrix](https://github.com/croadfeldt/udlm/blob/main/governance/governance-matrix.md) | [Accreditation and Authorization Matrix](https://github.com/croadfeldt/udlm/blob/main/governance/accreditation-and-authorization-matrix.md) | [Policy Profiles](../../architecture/governance-enforcement/policy-profiles.md) | [DCM Operator Interface Specification](dcm-operator-interface-spec.md) + +--- + +## Abstract + +This specification defines the unified registration flow by which all DCM provider types establish a trusted, governed relationship with a DCM deployment. It covers: the Provider Type Registry, the registration token model, the approval method configuration, the step-by-step registration pipeline, trust establishment, the per-type capability declaration schemas, the ongoing lifecycle after activation, federated trust configuration, and profile-bound registration policy defaults. + +--- + +## 1. Provider Type Registry + +The Provider Type Registry is the authoritative list of provider types that a DCM deployment will accept registrations for. It follows the same three-tier registry model as the Resource Type Registry. + +### 1.1 Registry Tiers + +| Tier | Maintained By | Examples | +|------|--------------|---------| +| **Core** | DCM Project | The eleven built-in provider types | +| **Verified Community** | Named community maintainers | Domain-specific provider types | +| **Organization** | Deploying organization | Custom/proprietary integrations | + +### 1.2 Provider Type Registry Entry + +```yaml +provider_type_registry_entry: + artifact_metadata: + uuid: + handle: "provider-types/service-provider" + version: "1.0.0" + status: active + tier: core + + provider_type_id: service_provider + display_name: "Service Provider" + description: "Realizes infrastructure resources for DCM" + + # What this provider type is permitted to do + permissions: + may_receive_assembled_payload: true + may_write_realized_state: true + may_write_discovered_state: true + may_receive_scoped_credentials: true + may_receive_phi_by_default: false # requires HIPAA accreditation + may_receive_sovereign_data: false # hard limit; never overridden + + # Approval method defaults (profile may override β€” see Section 4) + default_approval_method: reviewed # auto | reviewed | verified | authorized + + # Minimum trust level granted after approval + default_trust_level: standard # minimal | standard | elevated | high + + # Which deployment profiles permit this provider type + enabled_in_profiles: [minimal, dev, standard, prod, fsi, sovereign] + + # Capability declaration schema reference + capability_schema_ref: "schemas/service-provider-capabilities-v1.0.0" + + # Health check requirements + health_check: + endpoint_required: true + minimum_check_interval: PT1M + failure_threshold: 3 # failures before degraded status +``` + +### 1.3 The Eleven Core Provider Types + +| # | provider_type_id | Default Approval | Enabled In | +|---|-----------------|-----------------|------------| +| 1 | `service_provider` | reviewed | all profiles | +| 2 | `information_provider` | reviewed | all profiles | +| 3 | `composite service` | verified | standard+ | +| 4 | `(prescribed infrastructure)` | verified | all profiles | +| 5 | `(optional infrastructure)` | reviewed | dev+ (external endpoints: standard+) | +| 6 | `external_policy_evaluation` (Mode 1-2) | reviewed | all profiles | +| 7 | `external_policy_evaluation` (Internal and External) | verified | standard+ | +| 8 | `service_provider` | verified | standard+ | +| 9 | `auth_provider` | verified | all profiles | +| 10 | `service_provider` | reviewed | all profiles | + +Note: Internal and External External Policy Evaluators are treated as a separate registry entry from Mode 1-2 due to the elevated trust requirements. + +--- + +## 2. Registration Token Model + +Registration tokens are pre-issued by platform admins to authorize specific registrations without requiring full manual review at submission time. + +### 2.1 Token Structure + +```yaml +registration_token: + token_uuid: + token_value: + issued_by: + issued_at: + expires_at: # short-lived; default PT72H + single_use: true # token invalidated after first use + + scope: + provider_type_id: service_provider # which provider type this authorizes + provider_handle_pattern: "eu-west-*" # optional: restrict to matching handles + sovereignty_zone: eu-west-sovereign # optional: restrict to this zone + grants_auto_approval: true # whether token enables auto-approval + # grants_auto_approval: false = token still required but human review still needed + # (useful for tracking/auditing expected registrations without bypassing review) + + max_trust_level_granted: standard # token cannot grant higher than this +``` + +### 2.2 Token Issuance + +``` +POST /api/v1/admin/registration-tokens +Role: platform_admin + +{ + "provider_type_id": "service_provider", + "expires_in": "PT72H", + "scope": { + "provider_handle_pattern": "eu-west-*", + "sovereignty_zone": "eu-west-sovereign", + "grants_auto_approval": true + }, + "purpose": "EU-WEST production compute provider onboarding" +} + +Response 201 Created: +{ + "token_uuid": "", + "token_value": "", + "expires_at": "", + "scope": { ... } +} +``` + +Token values are presented exactly once β€” at creation. They are never retrievable again (stored as a hash). Platform admins must transmit the token securely to the provider operator. + +--- + +## 3. Approval Method Configuration + +> **Authority Tier Model:** Approval methods (`reviewed`, `verified`, `authorized`) are defined in the [Authority Tier Model](https://github.com/croadfeldt/udlm/blob/main/governance/authority-tier-model.md) as a named, ordered list. Organizations may insert custom tiers. The effective method resolution (Section 3.2) uses tier names; DCM resolves numeric weight from the ordered list at evaluation time (ATM-001). + +### 3.1 The Four Approval Methods + +| Method | Description | Approval path | +|--------|-------------|--------------| +| `auto` | DCM validates automatically; activates without human review | All validation checks pass β†’ active | +| `reviewed` | One platform admin must explicitly approve | Submitted β†’ validated β†’ pending_approval β†’ one admin approves β†’ active | +| `verified` | Two platform admins must independently approve | Submitted β†’ validated β†’ pending_approval β†’ two admins approve β†’ active | +| `authorized` | N members of a declared DCMGroup must record decisions via the Admin API; quorum tracked by DCM; deliberation process is the organization's responsibility | Submitted β†’ validated β†’ pending_approval β†’ DCMGroup members record votes via Admin API (or external systems calling API) β†’ quorum β†’ active | + +### 3.2 Effective Approval Method Resolution + +The effective approval method for a specific registration is the most restrictive result of: + +``` +effective_method = most_restrictive( + provider_type_registry.default_approval_method, + active_profile.registration_policy.min_approval_method, + registration_token.grants_auto_approval ? relax_to_auto : no_change +) +``` + +Resolution rules: +- Profile minimum overrides provider type default (always upward; profiles can only tighten) +- A valid registration token can relax the effective method to `auto` ONLY if the profile's `allow_token_auto_approval` is true +- `authorized` cannot be relaxed by any token + +### 3.3 Profile Registration Policy Defaults + +```yaml +profile_registration_policy: + minimal: + min_approval_method: reviewed + allow_token_auto_approval: true # token can enable auto for any type + require_sovereignty_declaration: false + require_health_check_before_approval: false + + dev: + min_approval_method: reviewed + allow_token_auto_approval: true + require_sovereignty_declaration: false + require_health_check_before_approval: true + + standard: + min_approval_method: reviewed + allow_token_auto_approval: true # tokens can auto-approve non-elevated types + token_auto_approval_max_trust: standard # tokens cannot auto-approve elevated types + require_sovereignty_declaration: true + require_health_check_before_approval: true + + prod: + min_approval_method: reviewed + high_trust_types_require: verified # storage, auth, policy-mode3-4, credential + allow_token_auto_approval: false # no auto-approval in prod + require_sovereignty_declaration: true + require_accreditation_submission: true # must submit at least self_declared + require_health_check_before_approval: true + approval_timeout: P7D # auto-reject if not approved within 7 days + + fsi: + min_approval_method: verified # everything requires dual approval + allow_token_auto_approval: false + require_sovereignty_declaration: true + require_accreditation_submission: true + minimum_accreditation_type: third_party # self_declared not accepted + require_health_check_before_approval: true + require_governance_matrix_check: true # governance matrix evaluated at registration + approval_timeout: P14D + + sovereign: + min_approval_method: authorized # everything requires authorized approval + allow_token_auto_approval: false + require_sovereignty_declaration: true + require_accreditation_submission: true + minimum_accreditation_type: regulatory_certification + require_hardware_attestation: true + require_governance_matrix_check: true + authorized_group_handle: "platform/registration-authorized" + approval_timeout: P30D +``` + +--- + +## 4. Registration Pipeline + +### 4.1 Lifecycle States + +``` +SUBMITTED β†’ VALIDATING β†’ PENDING_APPROVAL β†’ ACTIVE + β†˜ REJECTED (validation failure) + β†˜ REJECTED (approval denied) + +Additional states: +ACTIVE β†’ SUSPENDED (platform admin action or health failure) +ACTIVE β†’ DEREGISTERING β†’ DEREGISTERED (graceful removal) +ACTIVE β†’ FORCED_DEREGISTERED (immediate removal) +``` + +### 4.2 Step 1 β€” Submission + +Provider submits registration payload to DCM: + +``` +POST /api/v1/provider/register +Content-Type: application/json +X-DCM-Registration-Token: # optional; enables auto-approval if valid + +{ + "provider_type_id": "service_provider", + "handle": "eu-west-prod-1", + "display_name": "EU West Production Compute Provider", + "version": "2.1.0", + + # Mutual TLS certificate presented at connection level + # DCM extracts the certificate fingerprint from the TLS handshake + + "sovereignty_declaration": { ... }, + "accreditations": [ ... ], + "capabilities": { ... }, # per-type capability declaration + "health_endpoint": "https://provider.example.com/health", + "delivery_endpoint": "https://provider.example.com/dispatch" +} + +Response 202 Accepted: +{ + "registration_uuid": "", + "status": "VALIDATING", + "token_recognized": true, + "auto_approval_eligible": true, + "estimated_activation": "" +} +``` + +### 4.3 Step 2 β€” Validation (automated) + +DCM runs automated validation checks. All must pass before advancing to PENDING_APPROVAL: + +``` +Validation checks: + V1: Provider type permitted in active profile + β†’ Check Provider Type Registry: enabled_in_profiles includes active posture + β†’ FAIL: REJECTED with reason "provider_type_not_enabled_in_profile" + + V2: Governance Matrix pre-check + β†’ Evaluate matrix: is a provider of this type, in this zone, with these + accreditations, permitted to register? + β†’ FAIL: REJECTED with reason "governance_matrix_denied" + rule_uuid + + V3: Registration token validation (if provided) + β†’ Token exists and not expired + β†’ Token matches provider_type_id and handle pattern + β†’ Token not already used + β†’ FAIL: Token invalid; fall back to non-token approval method + + V4: Certificate validation + β†’ mTLS certificate presented and valid + β†’ Certificate chain acceptable (registered CA or pinned self-signed) + β†’ Certificate not in revocation list + β†’ FAIL: REJECTED with reason "certificate_invalid" + + V5: Sovereignty declaration completeness + β†’ Required fields present (if profile requires declaration) + β†’ Jurisdiction codes valid + β†’ FAIL: REJECTED with reason "sovereignty_declaration_incomplete" + + V6: Capability declaration consistency + β†’ Declared capabilities consistent with provider type + β†’ No contradictory declarations + β†’ FAIL: REJECTED with reason "capability_declaration_invalid" + + V7: Health endpoint reachability + β†’ DCM contacts health_endpoint + β†’ Provider responds with valid health payload + β†’ FAIL: status β†’ PENDING_APPROVAL with warning (profile may require passing) + + V8: Accreditation submission check + β†’ If profile requires accreditation submission: at least one accreditation present + β†’ Accreditation type meets profile minimum + β†’ FAIL: REJECTED with reason "accreditation_insufficient" +``` + +### 4.4 Step 3 β€” Approval + +Approval flow depends on effective_approval_method: + +**auto:** Registration immediately advances to ACTIVE after validation passes. + +**reviewed:** +``` +Registration enters PENDING_APPROVAL +Platform admin notification dispatched (urgency: medium) +Platform admin reviews in Admin API or Flow GUI: + GET /api/v1/admin/registrations/pending + POST /api/v1/admin/registrations/{registration_uuid}:approve + POST /api/v1/admin/registrations/{registration_uuid}:reject +On approval: β†’ ACTIVE +On rejection: β†’ REJECTED with required reason field +On timeout (approval_timeout): β†’ REJECTED with reason "approval_timeout" +``` + +**verified:** +``` +Registration enters PENDING_APPROVAL +Two independent platform admins must approve +First approval: recorded; notification sent to other admins for second approval +Second approval by different actor: β†’ ACTIVE +Same actor cannot approve twice +On timeout: β†’ REJECTED +``` + +**authorized:** +``` +Registration enters PENDING_APPROVAL +Authority group notified (all members) +Members vote via Admin API within declared quorum window +Quorum reached: β†’ ACTIVE +Quorum not reached within approval_timeout: β†’ REJECTED +``` + +### 4.5 Step 4 β€” Activation + +On ACTIVE status: +- Provider enters the DCM provider registry +- Governance matrix rules are re-evaluated with this provider now active +- Capacity monitoring begins (if Service or Information Provider) +- Health check polling begins +- Certificate rotation schedule established +- Activation audit record written: PROVIDER_ACTIVATED +- Notification: platform admin + Tenant admins (if Tenant-scoped provider) + +--- + +## 5. Per-Type Capability Declaration Schemas + +### 5.1 Service Provider Capabilities + +```yaml +service_provider_capabilities: + resource_types: + - resource_type_fqn: Compute.VirtualMachine + resource_type_spec_version: "2.1.0" + catalog_item_uuid: + availability_zones: [eu-west-1a, eu-west-1b] + max_instances: 1000 + + capacity_model: + reporting_method: reserve_query | static_declaration | both + reserve_query_endpoint: /reserve + reserve_query_timeout: PT10S + static_capacity: + Compute.VirtualMachine: 500 + + cancellation: + supports_cancellation: true + cancellation_supported_during: [DISPATCHED, PROVISIONING] + partial_rollback_possible: true + + discovery: + supports_discovery: true + discovery_endpoint: /discover + discovery_method: api_query | passive_event | hybrid + supports_incremental_discovery: true + + monitoring: + # Prometheus metrics endpoint β€” required for 1.0 readiness + metrics_endpoint: /metrics # must return Prometheus text format + metrics_port: 8080 # or same as operator endpoint + + # Required metric families (must be present at activation): + required_metrics: + - dcm_provider_dispatches_total # {resource_type, outcome} + - dcm_provider_dispatch_duration_seconds # {resource_type, quantile} + - dcm_provider_realizations_total # {resource_type, status} + - dcm_provider_health_status # 1=healthy, 0=unhealthy + + # Optional but recommended: + optional_metrics: + - dcm_provider_queue_depth # pending dispatch requests + - dcm_provider_capacity_remaining # {resource_type} + + # AEP.DEV linting β€” required for 1.0 readiness gate + aep_linting: + passes_aep_linting: true # must pass aep.dev linter before activation + linting_report_ref: # link to linting report + + # Tenant metadata endpoint β€” required for multi-tenant readiness + tenant_metadata_endpoint: /api/v1/tenants/{tenant_uuid}/metadata + # Returns: usage by tenant, quota consumed, active resources by type + + naturalization: + target_format: openstack_nova | vmware_vsphere | custom + custom_schema_ref: + + cost_metadata: + capex_allocation_per_unit: 12.50 + opex_per_unit_per_hour: 0.28 + currency: USD + cost_data_dynamic_source: null | + + data_handling: + max_data_classification_accepted: restricted + phi_capable: false # true requires HIPAA BAA accreditation + pci_capable: false +``` + +### 5.2 Information Provider Capabilities + +```yaml +information_provider_capabilities: + data_domains: + - domain: business_data + data_types: [business_unit, cost_center, product_owner] + authority_level: primary | secondary | supplementary + schema_version: "1.0.0" + query_endpoint: /query + write_back_supported: false + + query_capacity: + max_queries_per_second: 100 + rate_limit_window: 60s + burst_capacity: 200 + + confidence_model: + data_freshness_sla: PT1H + corroboration_sources: [cmdb, hr_system] + + caching: + cacheable: true + cache_ttl: PT15M + cache_invalidation_webhook: /invalidate +``` + +### 5.3 data store Capabilities + +```yaml +(prescribed infrastructure)_capabilities: + store_types_supported: + - store_type: gitops + branch_per_request: true + pr_semantics: true + search_index_companion: true + - store_type: write_once_snapshot + entity_uuid_keyed: true + hash_chain_integrity: true + point_in_time_query: true + + consistency: + guarantee: strong | eventual | bounded_staleness + bounded_staleness_max: PT5M + + replication: + geo_replicated: true + replication_regions: [eu-west, eu-north] + synchronous_replication: true + + encryption: + at_rest: AES-256 + hsm_backed: false + key_management: provider_managed | customer_managed | hsm + + retention: + supports_retention_policy: true + minimum_retention: P1Y + maximum_retention: P10Y + tamper_evident: true +``` + +### 5.4 External Policy Evaluator Capabilities + +```yaml +external_policy_evaluation_capabilities: + mode: 1 | 2 | 3 | 4 + policy_types_supported: + - gatekeeper + - validation + - transformation + - recovery + - orchestration_flow + + framework: opa | cedar | custom + rego_version: "1.0" # for OPA providers + + # Internal/External specific + remote_endpoint: https://policy.example.com/evaluate + endpoint_sovereignty_zone: eu-west-sovereign + evaluation_latency_p95: PT200MS + supports_bundle_push: true + supports_bundle_pull: true + + shadow_mode_supported: true + test_harness_endpoint: /test +``` + +### 5.5 Auth Provider Capabilities + +```yaml +auth_provider_capabilities: + authentication_modes: + - api_key + - ldap + - oidc + - oidc_mfa + - saml + - mtls + - hardware_token + - hardware_token_mfa + + mfa_methods: + - totp + - push_notification + - hardware_token + + rbac_model: flat | hierarchical | attribute_based + external_idp_integration: true + idp_protocols: [oidc, saml, ldap] + + token_lifetime_config: + default_lifetime: PT1H + min_lifetime: PT5M + max_lifetime: PT8H + step_up_supported: true + + builtin: false # true for DCM's built-in auth provider +``` + +### 5.6 notification service Capabilities + +```yaml +service_provider_capabilities: + delivery_channels: + - channel_type: slack + supports_threading: true + supports_urgency_routing: true + config_schema_ref: + - channel_type: pagerduty + supports_escalation: true + config_schema_ref: + - channel_type: webhook + protocols: [https] + auth_modes: [hmac_sha256, mtls, bearer] + config_schema_ref: + - channel_type: email + html_supported: true + + delivery_guarantees: + at_least_once: true + idempotency_key: notification_uuid + max_delivery_latency_seconds: 30 + retry_policy: + max_attempts: 7 + backoff: exponential + on_exhaustion: dead_letter + + sovereignty_aware_delivery: true # checks endpoint jurisdiction before delivery +``` + +### 5.7 credential management service Capabilities + +```yaml +service_provider_capabilities: + credential_types: + - api_key + - x509_certificate + - ssh_key + - service_account_token + - database_password + - hsm_backed_key + + secret_engines: + - vault + - aws_secrets_manager + - azure_key_vault + - gcp_secret_manager + + rotation_support: true + hsm_backed: false + fips_140_2_level: 1 | 2 | 3 # for sovereign deployments + dynamic_secrets: true # generate credentials on demand +``` + +### 5.8 event routing service Capabilities + +```yaml +(optional infrastructure)_capabilities: + protocols: [kafka, amqp, mqtt, grpc] + persistence: true + durability: at_least_once | exactly_once + max_throughput_msg_per_sec: 100000 + retention: + message_retention: P7D + retention_configurable: true + external_endpoints: false # true if messages can leave sovereignty boundary + encryption_in_transit: TLS-1.3 + encryption_at_rest: AES-256 +``` + +### 5.9 composite service definition Capabilities + +```yaml +composite service_capabilities: + constituent_provider_types: + - service_provider + - information_provider + + composition_model: sequential | parallel | conditional + partial_delivery_supported: true + compensation_supported: true + + resource_types_composed: + - resource_type_fqn: ApplicationStack.WebApp + constituent_resource_types: + - Compute.VirtualMachine + - Network.IPAddress + - DNS.Record + - Network.LoadBalancer +``` + +--- + +## 6. Federated Trust Configuration + +### 6.1 Federation Trust Postures + +| Posture | Description | Operations permitted | +|---------|-------------|---------------------| +| `verified` | Manually verified and approved by local platform admin | Full declared scope per tunnel authorization | +| `vouched` | Introduced through a trusted Hub DCM | Vouching authority's declared scope; cannot exceed voucher's scope | +| `provisional` | Cryptographically verified but not yet manually approved | catalog_query only (if profile permits) | + +### 6.2 Federation Trust Registration Flow + +``` +Remote DCM requests federation peering + β”‚ + β–Ό Cryptographic verification (always): + β”‚ mTLS certificate validation + β”‚ Certificate not in revocation list + β”‚ Certificate signed by acceptable CA + + β–Ό Governance matrix pre-check: + β”‚ Is federation with this peer's jurisdiction/accreditation permitted? + + β–Ό Trust posture determination: + β”‚ Prior record of this remote UUID? β†’ verified or vouched (per prior record) + β”‚ No prior record β†’ provisional + + β–Ό Approval flow (per profile): + β”‚ dev: provisional auto-promoted to verified (if governance matrix permits) + β”‚ standard: reviewed for verified promotion; provisional gets limited scope + β”‚ prod: verified for verified promotion; no provisional operations + β”‚ fsi: verified + accreditation check; no provisional + β”‚ sovereign: authorized_approval + hardware attestation; no provisional + + β–Ό Scope assignment per trust posture + + β–Ό Tunnel established with governance matrix enforcement +``` + +### 6.3 Profile Federation Trust Policy + +```yaml +profile_federation_policy: + minimal: + permitted_trust_postures: [verified, vouched, provisional] + auto_promote_provisional: true + cross_jurisdiction_permitted: true + accreditation_required_for_federation: false + + dev: + permitted_trust_postures: [verified, vouched, provisional] + auto_promote_provisional: true + provisional_permitted_operations: [catalog_query, resource_query] + cross_jurisdiction_permitted: true + + standard: + permitted_trust_postures: [verified, vouched] + approval_method_for_verified: reviewed + cross_jurisdiction_permitted: true + accreditation_required_for_federation: false + + prod: + permitted_trust_postures: [verified] + approval_method_for_verified: verified + cross_jurisdiction_permitted: true + accreditation_required_for_federation: false + + fsi: + permitted_trust_postures: [verified] + approval_method_for_verified: verified + cross_jurisdiction_permitted: false + accreditation_required_for_federation: true + minimum_peer_accreditation: third_party + re_verification_interval: PT8H + + sovereign: + permitted_trust_postures: [verified] + approval_method_for_verified: authorized + cross_jurisdiction_permitted: false + accreditation_required_for_federation: true + minimum_peer_accreditation: sovereign_authorization + hardware_attestation_required: true + data_classification_boundary: internal + re_verification_interval: PT4H +``` + +--- + +## 7. Ongoing Lifecycle After Activation + +### 7.1 Health Monitoring + +``` +DCM polls provider health endpoint every health_check_interval + β”‚ + β”œβ”€β”€ Response: healthy β†’ no action; next poll scheduled + β”œβ”€β”€ Response: degraded β†’ DCM updates capacity rating; reduces routing preference + β”œβ”€β”€ No response (1 failure) β†’ warning; retry at shorter interval + β”œβ”€β”€ No response (failure_threshold reached) β†’ provider status β†’ DEGRADED + β”‚ Notification: platform admin (urgency: high) + β”‚ New requests no longer routed to this provider + └── No response (2Γ— failure_threshold) β†’ provider status β†’ UNAVAILABLE + Active entities checked; drift detection triggered + Platform admin notification (urgency: critical) +``` + +### 7.2 Certificate Rotation + +```yaml +certificate_rotation: + rotation_interval: P90D # profile-governed default + transition_window: P7D # old cert valid during transition + pre_rotation_warning: P14D # warn provider P14D before expiry + +# Rotation flow: +POST /api/v1/provider/certificates:rotate +{ + "new_certificate_pem": "", + "transition_window": "P7D" +} +# DCM accepts both old and new certificates during transition window +# After transition window: old certificate rejected +``` + +### 7.3 Capability Updates + +Providers may update their capability declarations (new resource types, updated capacity models, new accreditations). Capability updates go through a simplified registration amendment flow: + +``` +POST /api/v1/provider/capabilities/update +{ + "amendment_type": "add_resource_type | remove_resource_type | update_capacity | add_accreditation", + "changes": { ... } +} + +β†’ VALIDATING (automated checks only) +β†’ PENDING_APPROVAL (if amendment_type is add_resource_type or sovereignty change) +β†’ ACTIVE (capability declarations updated) +``` + +### 7.4 Deregistration + +**Graceful deregistration:** +``` +Provider submits deregistration intent +DCM checks: active entities hosted at this provider +If active entities > 0: + Decision required: migrate_entities | decommission_entities | reject_deregistration +Platform admin approves deregistration plan +Provider enters DEREGISTERING state +Entity migration or decommission completes +Provider status β†’ DEREGISTERED +``` + +**Forced deregistration:** +``` +POST /api/v1/admin/providers/{provider_uuid}/force-deregister +Role: platform_admin +Requires: verified (fsi/sovereign: authorized) + +Immediate effect: + Provider status β†’ FORCED_DEREGISTERED + All active entities β†’ INDETERMINATE_REALIZATION + Governance matrix re-evaluated for all affected entities + Recovery policy fires: DRIFT_RECONCILE or NOTIFY_AND_WAIT per profile +``` + +--- + +### 7.2 Provider 1.0 Readiness Gates + +Before a Service Provider can be activated in `standard`, `prod`, `fsi`, or `sovereign` +profiles, the following readiness gates must pass. These align with the DCM roadmap's +1.0 criteria for Service Provider deployment: + +| Gate | Requirement | Profiles Required | +|------|------------|-------------------| +| `GATE-SP-01` | Simple OpenAPI Spec β€” declared at registration, URL reachable | all | +| `GATE-SP-02` | Healthy API β€” health endpoint returns `{"status": "healthy"}` at activation | all | +| `GATE-SP-03` | State Management β€” implements realized_state_push callback | all | +| `GATE-SP-04` | Tenant Metadata β€” endpoint declared or implemented | standard+ | +| `GATE-SP-05` | Prometheus Metrics β€” required metric families present at declared endpoint | standard+ | +| `GATE-SP-06` | AEP.DEV Linting β€” OpenAPI spec passes AEP linter with no errors | standard+ | +| `GATE-SP-07` | Multi-Tenant Ready β€” accepts tenant_uuid in all dispatch payloads | standard+ | + +DCM evaluates readiness gates automatically during the approval pipeline. A provider +that fails a gate is rejected with a `READINESS_GATE_FAILED` error listing which +gates failed and what is needed to pass. + +**Required metric families (GATE-SP-05):** + +``` +dcm_provider_dispatches_total{resource_type, outcome} +dcm_provider_dispatch_duration_seconds{resource_type, quantile} +dcm_provider_realizations_total{resource_type, status} +dcm_provider_health_status # 1=healthy, 0=unhealthy/degraded +``` + +**AEP linting (GATE-SP-06):** +Run the AEP linter against the provider's OpenAPI spec before registration. +Common failures: slash-verb paths instead of colon syntax, missing page_size +on list endpoints, 202 responses without Operation resource on async operations. +The linting report URL should be included in the monitoring capability declaration. + +--- + + +## 8. Error Model + +| Error Code | Meaning | +|-----------|---------| +| `provider_type_not_enabled` | Provider type not permitted in active profile | +| `governance_matrix_denied` | Governance matrix pre-check denied registration | +| `certificate_invalid` | mTLS certificate invalid or not from acceptable CA | +| `token_invalid` | Registration token expired, used, or type mismatch | +| `token_insufficient_scope` | Token present but does not grant required approval level | +| `sovereignty_declaration_incomplete` | Required sovereignty fields missing | +| `accreditation_insufficient` | Active profile requires higher accreditation type | +| `capability_declaration_invalid` | Capability declarations internally inconsistent | +| `approval_timeout` | Registration not approved within approval_timeout period | +| `health_check_failed` | Provider health endpoint unreachable during validation | +| `duplicate_handle` | A provider with this handle already exists in active status | + +--- + +*Document maintained by the DCM Project. For questions or contributions see [GitHub](https://github.com/dcm-project).*