Skip to content

feat(SEV-046): CD pipeline for devnet + mainnet program deploys (#272)#388

Merged
alrimarleskovar merged 1 commit into
mainfrom
claude/implement-roundfi-desktop-SRV6l
May 18, 2026
Merged

feat(SEV-046): CD pipeline for devnet + mainnet program deploys (#272)#388
alrimarleskovar merged 1 commit into
mainfrom
claude/implement-roundfi-desktop-SRV6l

Conversation

@alrimarleskovar
Copy link
Copy Markdown
Owner

TL;DR

Pass-12 attack on docs/operations/mainnet-canary-plan.md §3.3 (item: "CD pipeline approved + tested — staging deploy via #272 rehearsed at least once"). Closes the scaffolding half of #272 — workflow files, mainnet deploy script, and ceremony docs. Closing the execution half (3× clean rehearsal on devnet + one mainnet dry-run) is operator-gated; this PR is the lift to make that possible.

Same shape as PR #381 (Squads ceremony template, Immunefi package, observability spec): code + docs sandbox-completable, execution gated on humans + secrets.

Finding (SEV-046, Medium)

Three concrete risks the canary-plan flagged:

  1. Reproducibility drift — operator's WSL/macOS toolchain doesn't match OtterSec verify-build's runner. PR feat(toolchain): Agave 2.x migration (#230) — BLOCKED on upstream mpl-core #319 (Agave migration) historically burned ~1d/cycle on version mismatches.
  2. No audit trail — tx signatures lived only in the operator's terminal scrollback.
  3. No rehearsed mainnet ceremony — Squads + hardware scoped from scratch each time.

Also surfaced: scripts/devnet/deploy.ts:53 told operators to "use scripts/mainnet/deploy.ts" but that file did not exist.

What changed

File Lines Purpose
scripts/mainnet/deploy.ts (NEW) 200 Mainnet wrapper with hard guards + pre-flight hardening check
.github/workflows/devnet-deploy.yml (NEW) 140 Rehearsal lane — triggered on tag devnet-deploy-v*
.github/workflows/mainnet-deploy.yml (NEW) 180 Production lane — environment: mainnet approval gate, 2 jobs
docs/operations/cd-pipeline.md (NEW) 200 Topology + one-time setup + rehearsal protocol
docs/operations/mainnet-canary-plan.md +1 line §3.3 checkbox reference to new doc
package.json +1 script mainnet:deploy
docs/security/internal-audit-findings.md row SEV-046 (Medium, Closed) + summary table updates
CHANGELOG.md +1 entry Pass-12 wave description

Workflow design highlights

devnet-deploy.yml

  • Trigger: tag devnet-deploy-v* OR workflow_dispatch with reason input (free-text, captured in audit log).
  • Toolchain: Agave 3.0.0 + Anchor 0.30.1 (mirrors working anchor · build lane proven by SEV-012 / PR feat(SEV-012, partial): re-attempt bankrun-no-mpl-core CI lane (#381 fix) #385).
  • DEVNET_DEPLOYER_KEYPAIR secret (base64 JSON) restored to disk + balance ≥ 5 SOL gate.
  • anchor keys sync is allowed to auto-sync — devnet IDs are disposable.
  • Artifact: config/program-ids.devnet.json with 90-day retention.
  • Keypair cleanup if: always().

mainnet-deploy.yml

  • Trigger: tag mainnet-deploy-v* ONLY — no workflow_dispatch. Every mainnet run must trace to a signed git tag.
  • Two jobs:
    • preflight (no approval, read-only): mainnet_hardening_check vs live cluster + verifies anchor keys sync would be a no-op (mainnet IDs are permanent, refuses on drift).
    • deploy (environment: mainnet → GitHub required-reviewers gate): same toolchain, same guards from scripts/mainnet/deploy.ts, sealed anchor build --no-idl (no auto-sync).
  • Artifact: 365-day retention (longer than devnet — mainnet deploys are audit anchors).
  • Secrets scoped to the mainnet environment: MAINNET_DEPLOYER_KEYPAIR, EXPECTED_AUTHORITY, EXPECTED_TREASURY, EXPECTED_APPROVED_ADAPTER, MAINNET_CORE_PROGRAM_ID.

scripts/mainnet/deploy.ts

Refuses to start unless ALL of:

  • SOLANA_CLUSTER=mainnet-beta
  • MAINNET_DEPLOY_CONFIRM=I-UNDERSTAND-THIS-IS-MAINNET sentinel
  • MAINNET_DEPLOYER_KEYPAIR path exists
  • All 3 EXPECTED_* env vars set

Then runs mainnet_hardening_check (if ROUNDFI_CORE_PROGRAM_ID is set — defense-in-depth on top of preflight job), echoes canary-plan reminders, sleeps 10s (last-chance Ctrl-C window for manual invocation), proceeds with build + deploy.

Test plan

  • pnpm typecheck green
  • pnpm lint green
  • CI: all standard lanes stay green (anchor, js, deny, audit, freeze-enforcement, bankrun-no-mpl-core)
  • Out-of-sandbox: operator runs devnet-deploy.yml 3× clean to satisfy the canary-plan rehearsal item
  • Out-of-sandbox: operator configures mainnet GitHub environment with required reviewers + 5 secrets

What this PR does NOT do

  • Run the workflows. Both need repo-level secrets configured (DEVNET_DEPLOYER_KEYPAIR etc.), and mainnet needs the mainnet environment created in repo Settings. Sandbox can't do either.
  • Fund the deployer keypairs. Needs SOL on devnet (faucet) and mainnet (real).
  • OtterSec verify-build automation. Still manual post-deploy. Will be added when the formal OtterSec engagement starts.
  • Squads multisig integration in the deploy itself. Squads ceremony rotates upgrade authority POST-deploy; documented in cd-pipeline.md under follow-ups.

Freeze status

Permitted under FREEZE.md:

Bug fixes that close a tracked SEV — SEV-046 added in this PR.

Single concern (CD pipeline). No on-chain code changed. No effect on user-facing behavior until operator triggers a workflow.

Recommended merge method

Merge commit — single self-contained scaffolding bundle.

https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm


Generated by Claude Code

Pass-12 attack on docs/operations/mainnet-canary-plan.md §3.3
("CD pipeline approved + tested — staging deploy via #272 rehearsed
at least once"). Until this commit, every Anchor program deploy was
a manual pnpm run devnet:deploy from an operator workstation. Three
risks the canary-plan flagged:
  1. Reproducibility drift — operator WSL/macOS toolchain doesn't
     match OtterSec verify-build's runner
  2. No audit trail — tx signatures only in operator scrollback
  3. No rehearsed mainnet ceremony — Squads + hardware scoped from
     scratch each time

Fix (scaffolding deliverable — sandbox-completable, rehearsal
execution gated on operator):

- scripts/mainnet/deploy.ts (NEW) — mainnet wrapper with hard guards:
    SOLANA_CLUSTER=mainnet-beta required
    MAINNET_DEPLOY_CONFIRM=I-UNDERSTAND-THIS-IS-MAINNET sentinel
    MAINNET_DEPLOYER_KEYPAIR path required
    EXPECTED_AUTHORITY + EXPECTED_TREASURY + EXPECTED_APPROVED_ADAPTER required
  Plus pre-flight mainnet_hardening_check when ROUNDFI_CORE_PROGRAM_ID
  is set (defense-in-depth on top of preflight job). Canary-plan
  reminder echo + 10s sleep.

- .github/workflows/devnet-deploy.yml (NEW) — triggered on tag
  devnet-deploy-v* OR workflow_dispatch with required `reason`
  input. Pinned Agave 3.0.0 + Anchor 0.30.1 (matches working
  anchor · build lane proven by PR #385 / SEV-012). 5 SOL floor
  balance check. anchor build --no-idl + anchor keys sync (auto-sync
  OK on devnet) + anchor build + scripts/devnet/deploy.ts. Upload
  config/program-ids.devnet.json artifact (90d retention). Keypair
  cleanup on always.

- .github/workflows/mainnet-deploy.yml (NEW) — triggered on tag
  mainnet-deploy-v* ONLY (no workflow_dispatch — every mainnet run
  must trace to a signed git tag). Two jobs:
    preflight (no approval, read-only): mainnet_hardening_check vs
      live cluster + verifies anchor keys sync would be no-op
      (mainnet IDs permanent, refuses on drift).
    deploy: environment: mainnet triggers GitHub required-reviewers
      gate. Same toolchain. Same guards from scripts/mainnet/deploy.ts.
      Artifact 365d retention.

- package.json: new "mainnet:deploy" script.

- docs/operations/cd-pipeline.md (NEW) — topology, one-time setup
  (devnet keypair + mainnet environment + 5 secrets), first-deploy
  vs upgrade matrix, rehearsal protocol (3× clean on devnet).

- docs/operations/mainnet-canary-plan.md §3.3 — CD pipeline checkbox
  updated with reference to cd-pipeline.md and rehearsal-pending
  note.

Tracker (docs/security/internal-audit-findings.md):
- SEV-046 row added (Medium, Closed). Counts: Medium 12→13, Total
  46→47.
- Cleanup: SEV-045 "_this PR_" → "[#387]" now that #387 merged.
- Prose: Pass-12 wave description appended.

Methodology: same scaffolding pattern as PR #381 (Squads ceremony,
Immunefi package, observability spec) — workflow files + script +
docs sandbox-completable, but actual execution (running the
workflows, rehearsing 3× clean, exercising mainnet approval gate)
is gated on operator + GitHub repo Settings + funded keypairs.

https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
@vercel
Copy link
Copy Markdown

vercel Bot commented May 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
round_financial Ready Ready Preview, Comment May 18, 2026 10:42am

@alrimarleskovar alrimarleskovar merged commit c6e67ba into main May 18, 2026
8 checks passed
alrimarleskovar pushed a commit that referenced this pull request May 18, 2026
Companion to docs/operations/cd-pipeline.md (architecture spec).
This is the empirical lessons-learned counterpart documenting why
SEV-046's "clean" PR #388 took 5 follow-up PRs (#389-#393) before
a single anchor deploy actually completed.

Contents:
- Headline: 5 bugs, each one-line, none catchable by lint/typecheck
- Per-rehearsal chronology (1a → 1g)
- Per-bug root cause analysis (JSON parse trap, anchor-syn IDL,
  anchor wallet resolution chain, empirical SOL cost)
- What would have prevented it: localnet smoke-test workflow that
  exercises the full deploy path against solana-test-validator
- Mainnet-deploy mitigation summary (all 5 fixes already in)
- Outstanding operator-side work tracker

Filed as follow-up: the localnet smoke-test workflow design is
sketched but not implemented in scope of SEV-046. Tracking for
post-canary cleanup.

Generalized lesson burned in: workflow code is untestable except by
running it. The CI lanes pass; the workflow still doesn't work.

https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
alrimarleskovar added a commit that referenced this pull request May 19, 2026
#395)

Documents the 5-PR chain (#389-#393) that the SEV-046 CD scaffolding
(PR #388) needed before a single anchor deploy actually ran end-to-end.

Companion to docs/operations/cd-pipeline.md (architecture spec) — this
is the empirical lessons-learned counterpart.

Generalized lesson: workflow code is untestable except by running it.
CI lanes can be green while the workflow itself is broken in ways
that only surface against real runner conditions.

https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
alrimarleskovar pushed a commit that referenced this pull request May 19, 2026
…closed

First green end-to-end CD devnet deploy after the 5-PR bug chain
(#389#393). Run 26086314957 deployed all 4 RoundFi programs on a
clean ubuntu-latest runner, artifact captured, keypair scrubbed.

Changes:
- docs/operations/rehearsal-logs/2026-05-19-SEV-046-rehearsal-1g-success.md
  — full run record with program IDs, Solscan links, cost breakdown,
  next-rehearsal plan.
- docs/operations/rehearsal-logs/2026-05-18-SEV-046-rehearsal-saga.md
  — row 1g flipped from "pending" to ✓ green + headline update note.
- docs/operations/mainnet-canary-plan.md §3.3 — CD pipeline checkbox
  flipped [x] with link to the 1g success log. Stretch goal of 3×
  clean still at 1/3.
- docs/security/internal-audit-findings.md — SEV-046 tracker row
  updated with PR chain breadcrumb (#388 + #389#393 + #395) +
  empirical close status.
- CHANGELOG.md — new [Unreleased] entry summarizing the rehearsal 1g
  outcome above the existing #272 scaffolding entry.

§3.3 strict "at least once" criterion: ✓ satisfied.
3× clean reproducibility (cd-pipeline.md §"Rehearsal protocol"): 1/3.
Operator can now disparar rehearsals 2 + 3 with no expected code changes.

https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants