Skip to content

[FREEZE-EXCEPTION] feat(crank): canary-cycle daemon (closes 6-gap audit)#441

Merged
alrimarleskovar merged 2 commits into
mainfrom
claude/services-crank-canary-cycle-auto
Jun 1, 2026
Merged

[FREEZE-EXCEPTION] feat(crank): canary-cycle daemon (closes 6-gap audit)#441
alrimarleskovar merged 2 commits into
mainfrom
claude/services-crank-canary-cycle-auto

Conversation

@alrimarleskovar
Copy link
Copy Markdown
Owner

Summary

New services/crank/ — long-running daemon that keeps RoundFi pools moving forward on-chain. Closes the 6 gaps from the canary readiness audit (May 2026). services/orchestrator stays as the demo runner; this is the production cycle-advancer.

The Canary launch needs 48h cycles + 24h grace — well beyond any human-driven run loop. The on-chain program will not advance past a defaulted member without settle_default being explicitly called; without continuous cranking, every other member's score is held hostage to one missing tx.

Audit gaps addressed

Gap Symptom if unfixed Where
1 Defaulted members stall the pool indefinitely settleDefaults.ts
2 Cycles only advance when a human runs a script pollingLoop.ts
3 No way for UptimeRobot / Railway to see degradation healthServer.ts
4 Flaky RPC → silent "no pools" → /health stays ok rpcHealth.ts
5 Hardcoded memcmp offset desyncs after struct edits fetchActivePools.ts
6 INFRA blip indistinguishable from on-chain LOGIC error classifyError.ts

Beyond the 6 gaps

  • Postgres lease (opt-in via CRANK_LEASE_ENABLED=true) mirrors the indexer's reconciler_lease pattern (Wave 9.2 / PR [FREEZE-EXCEPTION] indexer: reconciler lease — leader election across replicas (Wave 9.2) #431) for multi-replica Railway deployments. Single-instance dev/devnet leaves it off and avoids the Postgres coupling entirely.
  • INFRA_FAILURE vs PAYMENT_MISSED classification in structured logs. The on-chain settle_default has no reason arg (would need a core PR + new audit); surfacing it off-chain lets the admin score-contestation UI flip verdicts without a chain change. Classification: INFRA_FAILURE iff rpcDownSince ≤ graceDeadline.
  • /health HTTP-code contract matters: UptimeRobot keys off 5xx, not JSON keywords. degraded → 503, starting/ok → 200.

What's NOT in scope

  • The on-chain settle_default CPI itself is integration-level — covered by the bankrun + litesvm lanes, not by this service's vitest suite.
  • Multi-RPC quorum stays in the indexer (backfill-events); the crank is a single-RPC consumer by design (the lease guards multi-replica gas double-spend, not RPC consensus).

CI

.github/workflows/crank.yml — typecheck + vitest, advisory, path-filtered to services/crank/** + sdk/**. Stays advisory until the canary launch; flip to required after the first prod cycle.

Test plan

  • pnpm --filter @roundfi/crank typecheck — clean
  • pnpm --filter @roundfi/crank test — 51/51 across 4 specs (classifyError boundaries, /health transitions + 503 mapping, settle eligibility + INFRA classification, polling-loop lease/RPC/per-pool isolation)
  • pnpm typecheck (workspace) — clean
  • pnpm lint (workspace) — clean
  • Devnet smoke (reviewer): anchor build → set env vars (SOLANA_RPC_URL, ROUNDFI_*_PROGRAM_ID, CRANK_KEYPAIR) → pnpm --filter @roundfi/crank startcurl :3000/health should return {"status":"starting",...} first, transition to ok after the first tick
  • Lease smoke (reviewer, optional): run two instances with CRANK_LEASE_ENABLED=true + shared DATABASE_URL → only one should log tick.complete, the other tick.no_lease

https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm


Generated by Claude Code

services/orchestrator is the demo runner (single-shot runCycle); the
Canary launch (48h cycles + 24h grace) needs a long-running daemon to
fire settle_default on missed grace deadlines. Without continuous
cranking, every other member's score is held hostage to one missing tx.

Closes the 6 gaps from the canary readiness audit:

  1. settleDefaults.ts   — actually fire settle_default on missed grace
  2. pollingLoop.ts      — continuous 60s polling loop, never exits
  3. healthServer.ts     — /health with starting/ok/degraded + HTTP 503
  4. rpcHealth.ts        — pre-tick getVersion() probe (gates markSuccess)
  5. fetchActivePools.ts — typed pool.all() decoder (no memcmp offset)
  6. classifyError.ts    — INFRA vs LOGIC vs UNKNOWN classification

Plus:
  - Postgres lease (CRANK_LEASE_ENABLED=true) mirrors the indexer's
    reconciler_lease pattern for multi-replica Railway deployments
  - INFRA_FAILURE vs PAYMENT_MISSED off-chain classification (the on-
    chain settle_default has no reason arg; surfaced in structured logs
    so the admin score-contestation UI can flip verdicts off-chain)
  - 51 vitest cases covering the 4 pure surfaces; settle_default CPI
    itself is integration-level (bankrun/litesvm lanes)
  - .github/workflows/crank.yml — typecheck + test, advisory,
    path-filtered to services/crank/** + sdk/**
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
round_financial Ignored Ignored Preview Jun 1, 2026 11:12am

Adds the canary-cycle crank daemon row to FREEZE.md's Active exceptions
table. Falls under freeze item 2 (audit findings remediation) — closes
the 6 gaps from the internal canary-readiness audit.
@alrimarleskovar alrimarleskovar changed the title feat(crank): canary-cycle daemon (closes 6-gap audit) [FREEZE-EXCEPTION] feat(crank): canary-cycle daemon (closes 6-gap audit) Jun 1, 2026
@alrimarleskovar alrimarleskovar merged commit fd8ab1f into main Jun 1, 2026
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants