[FREEZE-EXCEPTION] feat(crank): canary-cycle daemon (closes 6-gap audit)#441
Merged
Merged
Conversation
services/orchestrator is the demo runner (single-shot runCycle); the
Canary launch (48h cycles + 24h grace) needs a long-running daemon to
fire settle_default on missed grace deadlines. Without continuous
cranking, every other member's score is held hostage to one missing tx.
Closes the 6 gaps from the canary readiness audit:
1. settleDefaults.ts — actually fire settle_default on missed grace
2. pollingLoop.ts — continuous 60s polling loop, never exits
3. healthServer.ts — /health with starting/ok/degraded + HTTP 503
4. rpcHealth.ts — pre-tick getVersion() probe (gates markSuccess)
5. fetchActivePools.ts — typed pool.all() decoder (no memcmp offset)
6. classifyError.ts — INFRA vs LOGIC vs UNKNOWN classification
Plus:
- Postgres lease (CRANK_LEASE_ENABLED=true) mirrors the indexer's
reconciler_lease pattern for multi-replica Railway deployments
- INFRA_FAILURE vs PAYMENT_MISSED off-chain classification (the on-
chain settle_default has no reason arg; surfaced in structured logs
so the admin score-contestation UI can flip verdicts off-chain)
- 51 vitest cases covering the 4 pure surfaces; settle_default CPI
itself is integration-level (bankrun/litesvm lanes)
- .github/workflows/crank.yml — typecheck + test, advisory,
path-filtered to services/crank/** + sdk/**
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
Adds the canary-cycle crank daemon row to FREEZE.md's Active exceptions table. Falls under freeze item 2 (audit findings remediation) — closes the 6 gaps from the internal canary-readiness audit.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New
services/crank/— long-running daemon that keeps RoundFi pools moving forward on-chain. Closes the 6 gaps from the canary readiness audit (May 2026).services/orchestratorstays as the demo runner; this is the production cycle-advancer.The Canary launch needs 48h cycles + 24h grace — well beyond any human-driven run loop. The on-chain program will not advance past a defaulted member without
settle_defaultbeing explicitly called; without continuous cranking, every other member's score is held hostage to one missing tx.Audit gaps addressed
settleDefaults.tspollingLoop.tshealthServer.tsrpcHealth.tsmemcmpoffset desyncs after struct editsfetchActivePools.tsclassifyError.tsBeyond the 6 gaps
CRANK_LEASE_ENABLED=true) mirrors the indexer'sreconciler_leasepattern (Wave 9.2 / PR [FREEZE-EXCEPTION] indexer: reconciler lease — leader election across replicas (Wave 9.2) #431) for multi-replica Railway deployments. Single-instance dev/devnet leaves it off and avoids the Postgres coupling entirely.settle_defaulthas noreasonarg (would need a core PR + new audit); surfacing it off-chain lets the admin score-contestation UI flip verdicts without a chain change. Classification:INFRA_FAILUREiffrpcDownSince ≤ graceDeadline./healthHTTP-code contract matters: UptimeRobot keys off 5xx, not JSON keywords.degraded→ 503,starting/ok→ 200.What's NOT in scope
settle_defaultCPI itself is integration-level — covered by the bankrun + litesvm lanes, not by this service's vitest suite.backfill-events); the crank is a single-RPC consumer by design (the lease guards multi-replica gas double-spend, not RPC consensus).CI
.github/workflows/crank.yml— typecheck + vitest, advisory, path-filtered toservices/crank/**+sdk/**. Stays advisory until the canary launch; flip to required after the first prod cycle.Test plan
pnpm --filter @roundfi/crank typecheck— cleanpnpm --filter @roundfi/crank test— 51/51 across 4 specs (classifyError boundaries, /health transitions + 503 mapping, settle eligibility + INFRA classification, polling-loop lease/RPC/per-pool isolation)pnpm typecheck(workspace) — cleanpnpm lint(workspace) — cleananchor build→ set env vars (SOLANA_RPC_URL,ROUNDFI_*_PROGRAM_ID,CRANK_KEYPAIR) →pnpm --filter @roundfi/crank start→curl :3000/healthshould return{"status":"starting",...}first, transition tookafter the first tickCRANK_LEASE_ENABLED=true+ sharedDATABASE_URL→ only one should logtick.complete, the othertick.no_leasehttps://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
Generated by Claude Code