Skip to content

PRD: full-featured deployment api #2

Description

@ctxswitch

Problem Statement

Teams coordinate deployments across issues, pull requests, chat, and manual checklists. That makes it hard for Engineers, Reviewers, and on-call responders to answer what is being deployed, who requested it, whether it was reviewed, what risk was accepted, what rollback plan exists, and what happened as the Deployment moved through its lifecycle.

The Deployment API needs to provide a durable, reviewable control-plane record for Deployments without running real infrastructure changes.

Solution

Build a full-featured HTTP Deployment API backed by SQLite. The API will let Engineers create Deployments, let Reviewers approve or reject them, let operators move approved Deployments through their lifecycle, and let on-call engineers inspect both the current Deployment state and the immutable Deployment event history.

The API will use one canonical Deployment object from initial request through final outcome. It will store current Deployment state for efficient list and fetch operations, and append a Deployment event for every creation, review, start, success, failure, and rollback action.

User Stories

  1. As an Engineer, I want to create a Deployment with service, environment, version, requester, Work references, and rollback plan, so that the team has a clear record of what is being requested.
  2. As an Engineer, I want production Deployments to require Risk notes, so that production risk is explicit before review.
  3. As an Engineer, I want non-production Deployments to allow but not require Risk notes, so that lower-environment requests are not burdened with unnecessary process.
  4. As an Engineer, I want every Deployment to require a Rollback plan, so that recovery thinking is captured before any environment changes.
  5. As an Engineer, I want each Deployment to include at least one Work reference, so that reviewers can connect the Deployment back to planned work.
  6. As an Engineer, I want the API to generate Deployment IDs, statuses, and timestamps, so that clients do not need to coordinate those values.
  7. As an Engineer, I want invalid creation requests to return validation errors, so that clients can fix bad input without guessing.
  8. As a Reviewer, I want to approve a pending Deployment with my actor identity, so that the review decision is visible.
  9. As a Reviewer, I want to reject a pending Deployment with a required note, so that the rejection reason is recorded.
  10. As a Reviewer, I want the API to prevent requesters from approving their own Deployments, so that approval remains a separate review action.
  11. As a Reviewer, I want a Deployment to be reviewed only while it is pending approval, so that review decisions cannot be changed accidentally after lifecycle progress begins.
  12. As an on-call Engineer, I want to list Deployments, so that I can see current rollout activity and recent outcomes.
  13. As an on-call Engineer, I want to filter Deployments by service, environment, and status, so that I can quickly find the records relevant to an incident or service.
  14. As an on-call Engineer, I want list responses to be paginated, so that large Deployment histories remain manageable.
  15. As an on-call Engineer, I want to fetch one Deployment by ID, so that I can inspect its current state and review metadata.
  16. As an on-call Engineer, I want to fetch Deployment events oldest first, so that I can understand exactly how the Deployment reached its current state.
  17. As an operator, I want to start an approved Deployment, so that the control plane records when rollout activity began.
  18. As an operator, I want the API to reject start attempts for Deployments that are not approved, so that invalid lifecycle transitions are explicit.
  19. As an operator, I want to mark a started Deployment as succeeded, so that the final successful outcome is recorded.
  20. As an operator, I want to mark a started Deployment as failed with a required note, so that failure context is recorded.
  21. As an operator, I want to mark a started Deployment as rolled back with a required note, so that reversal is recorded even before success or failure is chosen.
  22. As an operator, I want to mark a succeeded Deployment as rolled back, so that post-success reversal is visible.
  23. As an operator, I want to mark a failed Deployment as rolled back, so that recovery after failure is visible.
  24. As an on-call Engineer, I want rejected and rolled-back Deployments to be terminal, so that those states cannot be accidentally mutated.
  25. As an on-call Engineer, I want succeeded and failed Deployments to remain final unless rollback is recorded, so that later reversal is represented explicitly.
  26. As an API client, I want invalid lifecycle transitions to return conflict errors, so that state-machine failures are distinguishable from malformed requests.
  27. As an API client, I want missing or invalid fields to return validation errors, so that user input problems are clear.
  28. As an API client, I want action endpoints to accept an explicit actor because authentication is out of scope, so that audit history can still record who performed an action.
  29. As an API client, I want reject, fail, and rollback actions to require notes, so that important negative or recovery decisions always carry context.
  30. As an API client, I want approve, start, and succeed actions to accept optional notes, so that extra context can be recorded when useful.
  31. As a maintainer, I want Deployment state to survive process restarts, so that the API behaves like a durable control plane.
  32. As a maintainer, I want SQLite storage without external infrastructure dependencies, so that local development and tests stay simple.
  33. As a maintainer, I want current Deployment state and Deployment events written transactionally, so that state and history cannot drift during lifecycle actions.
  34. As a maintainer, I want current Deployment reads to avoid replaying events, so that ordinary list and fetch endpoints stay straightforward.
  35. As a maintainer, I want the event history to be append-only, so that auditability does not depend on mutable notes or overwritten status.
  36. As a maintainer, I want a testable domain lifecycle module, so that state-transition behavior can be verified without HTTP or database setup.
  37. As a maintainer, I want a testable repository boundary, so that SQLite persistence, transactions, and event inserts can be verified directly.
  38. As a maintainer, I want HTTP handlers to depend on stable domain and repository interfaces, so that transport concerns do not leak into lifecycle rules.

Implementation Decisions

  • Build a domain lifecycle module that owns Deployment statuses, valid transitions, creation validation, review validation, action-note requirements, and self-approval prevention.
  • Build a SQLite repository module that owns schema initialization, current Deployment persistence, Deployment event persistence, transactional lifecycle writes, filtering, pagination, and fetch-by-ID behavior.
  • Build an HTTP API module that owns JSON request parsing, response rendering, route registration, validation error responses, conflict error responses, and not-found responses.
  • Keep one canonical Deployment object from request through outcome. Do not create a separate request or run object.
  • Store reviewed_by and reviewed_at as neutral Review metadata rather than separate approved and rejected actor fields. The status records whether the Review was approval or rejection.
  • Store Work references as a collection so Deployments can point at issues, pull requests, or similar planning artifacts.
  • Require Work references, Rollback plan, service, environment, version, and requester at creation.
  • Require Risk only for production Deployments.
  • Validate Risk and Rollback plan requirements at creation time only.
  • Accept explicit actor fields on action requests because full authentication and authorization are out of scope.
  • Use the lifecycle transitions pending_approval -> approved, pending_approval -> rejected, approved -> started, started -> succeeded, started -> failed, started -> rolled_back, succeeded -> rolled_back, and failed -> rolled_back.
  • Treat rejected and rolled_back as terminal states.
  • Treat succeeded and failed as final unless a rollback is recorded.
  • Append a Deployment event for every creation, approval, rejection, start, success, failure, and rollback.
  • Require notes for rejection, failure, and rollback events.
  • Allow optional notes for approval, start, and success events.
  • Return Deployment events oldest first.
  • Use SQLite for durable storage.
  • Store current state in a deployments table and append-only history in a deployment_events table.
  • Execute lifecycle updates and event inserts in one SQLite transaction.
  • Use current state as the source for normal list and fetch endpoints; use events as audit history rather than requiring event replay for normal reads.
  • Support list filters by service, environment, and status.
  • Support limit-based pagination for list responses.

Testing Decisions

  • Good tests should assert externally visible behavior: accepted requests, rejected requests, status transitions, response shapes, persisted records, and event history. They should not assert private helper structure.
  • Test the domain lifecycle module with table-driven cases for valid transitions, invalid transitions, terminal-state behavior, rollback-after-success, rollback-after-failure, self-approval prevention, required production Risk, and required Rollback plan.
  • Test the repository module against SQLite so schema behavior, transactions, current-state updates, event inserts, filtering, pagination, and persistence across connection reopen are covered.
  • Test HTTP handlers through request/response behavior so JSON validation, not-found errors, conflict errors, and success responses are covered.
  • Test that every lifecycle action appends exactly one corresponding Deployment event and updates current Deployment state consistently.
  • Test that lifecycle write failures do not leave partial state or orphaned events.
  • Test list filtering and pagination as API behavior, not query implementation.
  • There is little prior test structure in the current scaffold, so new tests should establish the expected pattern for domain, storage, and HTTP coverage.

Out of Scope

  • Running real deploys.
  • Integrating with Kubernetes, cloud APIs, CI systems, or deployment platforms.
  • Full authentication and authorization.
  • A web UI.
  • Multi-service orchestration.
  • Multi-instance write coordination beyond SQLite's normal behavior.
  • Full event sourcing where normal reads require replaying history.

Further Notes

  • This PRD follows the current Deployment API design context and ADRs.
  • The existing design-context PR is related background for this PRD.
  • Future issue breakdown should use vertical slices that ship behavior end to end, starting with durable creation and fetch before adding review actions, lifecycle actions, list filtering, and event history refinements.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageMaintainer needs to evaluate

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions