On-call runbook and POST /internal/incident/… (optional) for StreamPay
## Description
**Author** a practical **runbook** in `docs/runbooks/`: (1) **stuck** settlement, (2)
suspected double payout or escrow leak, (3) Stellar outage or
Horizon hard-down. Link Grafana queries, log searches, and Soroban RPC
checks. This issue can add optional read-only internal status endpoints to speed triage.
No manual money move without 2-person rule—documented.
## Requirements and context
- **Step-by-step** for L1; **escalation** for L2 with roles.
-
Links to rollback and key rotation runbooks.
-
Comms template for “we are having payment delays.”
-
PII rules in public customer comms.
-
Quarterly table-top exercise checkbox in the doc.
## Suggested execution
1. `git checkout -b docs/runbook-stream-settlement`
-
Interviews with current on-call; capture real grep and queries.
-
PR with reviewer = security+ops; no code required, but 95% is N/A; instead completeness checklist.
-
If internal endpoints: small tests; otherwise docs-only.
-
Timeframe: 96h for v1; iterate after first incident.
- Run the full test suite; add or update tests until the agreed coverage bar is met.
- Cover edge cases listed in this issue; document any intentional exclusions with brief rationale in the PR.
- Include relevant test output (e.g. test runner summary) or a link to a passing CI run in the pull request.
- Add security notes for auth, keys, PII, chain settlement, or money movement (assumptions verified, out-of-scope items).
Example commit message
docs(ops): add settlement and Stellar incident runbook with triage and escalation
Guidelines
- Target: at least 95% coverage on new or meaningfully changed code (per the repo’s standard tooling).
- Documentation: update contributor-facing or API documentation where a reviewer would be blocked without it.
- Timeframe: 96 hours to ready-for-review (surface blockers early).
On-call runbook and
POST /internal/incident/…(optional) for StreamPaysuspected double payout or escrow leak, (3) Stellar outage or
Horizon hard-down. Link Grafana queries, log searches, and Soroban RPC
checks. This issue can add optional read-only
internalstatus endpoints to speed triage.No manual money move without 2-person rule—documented.
Links to rollback and key rotation runbooks.
Comms template for “we are having payment delays.”
PII rules in public customer comms.
Quarterly table-top exercise checkbox in the doc.
Interviews with current on-call; capture real
grepand queries.PR with reviewer = security+ops; no code required, but 95% is N/A; instead completeness checklist.
If internal endpoints: small tests; otherwise docs-only.
Timeframe: 96h for v1; iterate after first incident.
Example commit message
docs(ops): add settlement and Stellar incident runbook with triage and escalationGuidelines