Skip to content

feat(SEV-012, partial): re-attempt bankrun-no-mpl-core CI lane (#381 fix)#385

Merged
alrimarleskovar merged 4 commits into
mainfrom
claude/implement-roundfi-desktop-SRV6l
May 18, 2026
Merged

feat(SEV-012, partial): re-attempt bankrun-no-mpl-core CI lane (#381 fix)#385
alrimarleskovar merged 4 commits into
mainfrom
claude/implement-roundfi-desktop-SRV6l

Conversation

@alrimarleskovar
Copy link
Copy Markdown
Owner

TL;DR

Closes 80%+ of the bankrun-in-CI coverage gap that SEV-012 tracks. PR #381 first attempted this lane but failed CI with exit code 101 and was reverted in commit 4b1fa12. Root cause traced: legacy Solana installer URL (release.solana.com/v1.18.26/install), not the anchor-syn #319 patch as previously suspected.

This re-attempt mirrors the working anchor · build lane's toolchain install verbatim and fixes a latent contributor that would have masked the original failure: rebuild-idls.sh's program loop was missing roundfi_yield_kamino.

SEV-012 moves from 🟠 Blocked to 🟡 Partial.

What changed

File Change
scripts/dev/rebuild-idls.sh Add roundfi_yield_kamino to the program loop (bankrun.ts::loadIdl requires all 4 IDLs unconditionally)
.github/workflows/ci.yml New bankrun-no-mpl-core job, ~120 lines. Mirrors anchor · build's Agave 3.0.0 + anchor-cli install. Runs pnpm test:bankrun:no-mpl-core (~17 Kamino tests)
docs/security/internal-audit-findings.md SEV-012 row: 🟠 Blocked → 🟡 Partial, with PR #381 retrospective. Summary table updated (Medium-Blocked 1→0, Medium-Partial 0→1). Mainnet-blocker status paragraph reflects partial unblock
CHANGELOG.md New [Unreleased] entry for the lane addition + script fix

Commits (recommended review order)

1. fix(SEV-012, partial)b1e6662 — rebuild-idls.sh kamino-loop fix

Pre-existing bug. The bankrun harness (tests/_harness/bankrun.ts::setupBankrunEnv) calls loadIdl() for all 4 IDLs: core, reputation, yield_mock, yield_kamino. But rebuild-idls.sh's loop iterated only 3 — even after the script "succeeded", every bankrun spec failed at setup time on the missing roundfi_yield_kamino.json.

Latent contributor to PR #381's failure: even with the toolchain issue fixed, the bankrun specs would have failed downstream on missing IDL.

2. ci(SEV-012, partial)f423cc5 — CI lane + docs

Resurrects the bankrun-no-mpl-core lane that PR #381 introduced and 4b1fa12 reverted.

PR #381 root-cause analysis. The reverted lane used:

- name: Install Solana toolchain
  run: sh -c "$(curl -sSfL https://release.solana.com/v1.18.26/install)"

release.solana.com is Anza's legacy installer domain. The redirect chain to current Agave releases returns non-zero exit codes under certain conditions, causing the step to exit 101.

The working anchor · build lane uses a direct GitHub releases tarball:

- name: Install Solana Agave 3.0.0
  run: |
    curl -sL -o /tmp/agave.tar.bz2 \
      https://github.com/anza-xyz/agave/releases/download/v3.0.0/solana-release-x86_64-unknown-linux-gnu.tar.bz2
    mkdir -p "$HOME/agave"
    tar -xjf /tmp/agave.tar.bz2 -C "$HOME/agave" --strip-components=1
    echo "$HOME/agave/bin" >> "$GITHUB_PATH"

The new bankrun-no-mpl-core lane mirrors this verbatim plus Anchor install via cargo install --git --tag v0.30.1.

Coverage delta

Coverage path Before After
Bankrun spike Phase 1-2b/5 (Kamino program load, disc validation, cascade-clone, deposit/harvest CPI vs cloned state) local-only CI on every PR (~17 tests)
join_pool mpl_core path local-only local-only (blocked)
escape_valve_buy mpl_core path local-only local-only (blocked)

What this PR does NOT do

The mpl_core-dependent specs stay local-only. Closing that surface requires upstream resolution of mpl-core#282 — mpl-core 0.12 depends directly on borsh 1.x while Anchor (incl. 1.0.2) drags borsh 0.10 transitively via solana_pubkey. Issue stale since 2026-05-15 / no maintainer response in 3 days, so this PR is the right pragmatic move now.

Local PR #319 tracks the full unblock.

Test plan

  • Local: bash scripts/dev/rebuild-idls.sh produces all 4 IDLs from clean state (verified by deleting target/idl/*.json first)
  • Local: pnpm test:bankrun:no-mpl-core runs (1 passing + 16 pending — pending state is expected without klend.so locally; CI will warm the cache)
  • Local: pnpm lint green
  • CI: new bankrun-no-mpl-core lane completes — on first run, cache miss → downloads klend.so from mainnet → 16 tests transition from pending to actually running
  • CI: all standard lanes stay green (anchor, js, deny, audit, freeze-enforcement)

Freeze status

Permitted under FREEZE.md:

Bug fixes that close a tracked SEV in docs/security/internal-audit-findings.md — SEV-012 status updated in the same PR.

Two distinct concerns (script bug + new CI lane), both tied to the same SEV. Commit-level audit trail preserved.

Recommended merge method

Merge commit to preserve the two commits' separation (script-fix vs CI lane are distinct review surfaces).

https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm


Generated by Claude Code

claude added 2 commits May 18, 2026 08:33
The bankrun harness (tests/_harness/bankrun.ts::setupBankrunEnv)
calls loadIdl() for all 4 program IDLs unconditionally — core,
reputation, yield_mock, AND yield_kamino. But rebuild-idls.sh's
loop iterated only 3:

  for prog in roundfi_core roundfi_reputation roundfi_yield_mock

So even after the script "succeeded", any bankrun spec failed at
setup time on the missing roundfi_yield_kamino.json.

Latent contributor to PR #381's exit-code-101 CI failure mode
(the lane-level toolchain issue is fixed in the next commit).

https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
PR #381 added this lane but it failed CI with exit code 101 and was
reverted in commit 4b1fa12. Root cause traced to the Solana toolchain
installer: PR #381 used the legacy `release.solana.com/v1.18.26/install`
URL, but Anza's redirect chain returns non-zero on that path. The
working `anchor · build` lane uses `anza-xyz/agave v3.0.0` from a
direct GitHub releases tarball.

This re-attempt mirrors the anchor lane's toolchain install verbatim
(Agave 3.0.0 + anchor-cli via `cargo install --git --tag v0.30.1`).
Combined with the prior commit fixing rebuild-idls.sh's program loop,
the bankrun-no-mpl-core lane should now run cleanly:

  1. anchor build --no-idl       (.so for all 4 programs)
  2. bash scripts/dev/rebuild-idls.sh  (#319 patch + 4 IDLs)
  3. cache klend.so (or download from mainnet)
  4. pnpm test:bankrun:no-mpl-core  (security_kamino_cpi.spec.ts)

Local verification: rebuild-idls.sh produces all 4 IDLs from clean
state, test:bankrun:no-mpl-core runs (1 passing + 16 pending — pending
state is expected without klend.so on first CI run, then the cache
warms on the actions/cache@v4 path with weekly key
`klend-so-2026-W20`).

SEV-012 status: 🟠 Blocked → 🟡 Partial. The mpl_core-dependent specs
(join_pool + escape_valve_buy paths) stay local-only pending upstream
mpl-core 0.12 ↔ Anchor 1.0 borsh compat (mpl-core#282 — issue stale
since 2026-05-15 / no maintainer response in 3 days). When that
unblocks, merge this lane back into the full anchor lane.

Updates:
- .github/workflows/ci.yml: new bankrun-no-mpl-core job (~120 lines)
- docs/security/internal-audit-findings.md: SEV-012 row Blocked → Partial,
  summary table Medium-Blocked 1→0 / Medium-Partial 0→1, prose paragraph
  about mainnet-blocker status updated to reflect the partial unblock.
- CHANGELOG.md: new [Unreleased] section.

https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
@vercel
Copy link
Copy Markdown

vercel Bot commented May 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
round_financial Ignored Ignored Preview May 18, 2026 9:30am

claude added 2 commits May 18, 2026 09:17
…lane

Auditoria 2026-05-18 dos 8 specs bankrun pra ver quais OUTROS além de
security_kamino_cpi.spec.ts podem entrar no lane:

- 3 specs (edge_cycle_boundary, sev034_release_escrow_lifecycle,
  economic_parity) usam joinMembers() → wraps joinPool() → CreateV2
  CPI em mpl_core. Não dão sem mpl_core loaded.
- 2 specs (edge_grace_default + variant) passam metaplexCore direto
  como account-meta no settle_default. Mesma dependência.
- 1 spec (app_encoders_bankrun) bypassa joinPool via writeAnchorAccount,
  então PODERIA rodar sem mpl_core. Mas 7 de 9 sub-testes falham com
  custom error codes não-relacionados (0x177c, 0x1777, 0xbbd) por
  state-shape drift entre quando o spec foi escrito e o layout atual
  de Pool/Member. Inclusão requer spec-refresh PR separado.

Conclusão: escopo atual (só security_kamino_cpi.spec.ts) é correto.
Expansão precisa de trabalho a nível de spec, não a nível de lane.

Documentação inline no ci.yml para evitar re-investigação futura.

https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
Establishes docs/security/post-mortems/ as the canonical location for
deep-dive write-ups of Critical/High SEVs. First entry covers SEV-040
(the KAMINO_LEND_PROGRAM_ID typo).

Content covers:
- Timeline (defect introduced ~Q1 2026, caught 2026-05-17 at bankrun
  spike planning, fixed same day in commit cad9ca4, time-to-fix < 24h)
- 5-whys root cause (Anchor pubkey! macro validates syntax not
  semantics; existing unit tests pinned discriminators not address;
  integration tests deployed yield-mock never yield-kamino; visual
  code review of 44-char base58 unreliable)
- Why every existing safeguard failed
- Methodology lesson + downstream propagation: SEV-040 lesson "every
  pinned external-protocol constant needs an automated assertion vs
  an authoritative source" was applied at 3 surfaces in the following
  week and found 2 more same-class bugs (SEV-042 Critical, SEV-043
  Medium) + a follow-up oracle (SEV-041 layout pinning)
- Process rule: any commit introducing or modifying such a constant
  must include a sibling assertion in the same commit
- Action items + status table (all closed)
- Cross-references to fix commits, related SEVs, and authoritative
  source URLs

Internal pre-audit framing preserved — designed so the formal external
audit (Adevar Labs, scoping in progress) can verify the team's
internal process response against the codebase pattern.

Tracker cross-link added on the SEV-040 row in
docs/security/internal-audit-findings.md.

https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
@alrimarleskovar alrimarleskovar merged commit 9e3fde9 into main May 18, 2026
8 checks passed
alrimarleskovar pushed a commit that referenced this pull request May 18, 2026
Pass-10 sweep of mainnet-canary-plan.md §3.2 against the pre-flight
script found 2 fields the canary plan explicitly enumerates but the
script never validated:

1. config.metaplex_core — should be CoREENxT6tW1HoK8ypY1SxRMZTcVPm7R94rH4PZNhX7d
   (canonical Metaplex Core, same on every cluster). A substituted
   mpl-core program (compromised, wrong fork, devnet test build) could
   silently accept different plugin payloads in join_pool's CreateV2
   CPI → position NFTs would be malformed without an on-chain error.

2. config.usdc_mint — should be EPjFWdd5...zTDt1v on mainnet (canonical
   USDC) or 4zMMC9sr...DncDU on devnet. Wrong mint = funds routed to a
   wrong-decimals or wrong-issuer token.

The SEV-042 fix wave (PR #383) landed 5 new BLOCKER checks but stopped
short of these two. Operator error could have shipped a canary with no
script-level guard.

On-chain config is correct today; SEV-044 is regression-prevention,
not a runtime bug.

Fix:
- New OFFSETS_POST_DISC entries: usdcMint @ 64, metaplexCore @ 96
- New canonical constants: CANONICAL_METAPLEX_CORE_ID,
  CANONICAL_MAINNET_USDC_MINT, CANONICAL_DEVNET_USDC_MINT
- New env var: EXPECTED_USDC_MINT (with cluster-aware default)
- 2 new BLOCKER checks (BLOCKER 3 usdc_mint, BLOCKER 4 metaplex_core);
  subsequent BLOCKERs renumbered 5–11

Tracker updates:
- SEV-044 row added (Medium, Closed). Counts: Medium 10→11, Total 44→45.
- Cleanup: SEV-040/041/042 "_this PR_" → "[#383]"; SEV-012 "_this PR_"
  → "[#385]" now that those PRs are merged.

Methodology: same shape as SEV-043 (coverage gap discovered via
audit-then-test-pin sweep). The canary-plan checklist is now a
swept source-of-truth.

https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
alrimarleskovar pushed a commit that referenced this pull request May 18, 2026
Pass-12 attack on docs/operations/mainnet-canary-plan.md §3.3
("CD pipeline approved + tested — staging deploy via #272 rehearsed
at least once"). Until this commit, every Anchor program deploy was
a manual pnpm run devnet:deploy from an operator workstation. Three
risks the canary-plan flagged:
  1. Reproducibility drift — operator WSL/macOS toolchain doesn't
     match OtterSec verify-build's runner
  2. No audit trail — tx signatures only in operator scrollback
  3. No rehearsed mainnet ceremony — Squads + hardware scoped from
     scratch each time

Fix (scaffolding deliverable — sandbox-completable, rehearsal
execution gated on operator):

- scripts/mainnet/deploy.ts (NEW) — mainnet wrapper with hard guards:
    SOLANA_CLUSTER=mainnet-beta required
    MAINNET_DEPLOY_CONFIRM=I-UNDERSTAND-THIS-IS-MAINNET sentinel
    MAINNET_DEPLOYER_KEYPAIR path required
    EXPECTED_AUTHORITY + EXPECTED_TREASURY + EXPECTED_APPROVED_ADAPTER required
  Plus pre-flight mainnet_hardening_check when ROUNDFI_CORE_PROGRAM_ID
  is set (defense-in-depth on top of preflight job). Canary-plan
  reminder echo + 10s sleep.

- .github/workflows/devnet-deploy.yml (NEW) — triggered on tag
  devnet-deploy-v* OR workflow_dispatch with required `reason`
  input. Pinned Agave 3.0.0 + Anchor 0.30.1 (matches working
  anchor · build lane proven by PR #385 / SEV-012). 5 SOL floor
  balance check. anchor build --no-idl + anchor keys sync (auto-sync
  OK on devnet) + anchor build + scripts/devnet/deploy.ts. Upload
  config/program-ids.devnet.json artifact (90d retention). Keypair
  cleanup on always.

- .github/workflows/mainnet-deploy.yml (NEW) — triggered on tag
  mainnet-deploy-v* ONLY (no workflow_dispatch — every mainnet run
  must trace to a signed git tag). Two jobs:
    preflight (no approval, read-only): mainnet_hardening_check vs
      live cluster + verifies anchor keys sync would be no-op
      (mainnet IDs permanent, refuses on drift).
    deploy: environment: mainnet triggers GitHub required-reviewers
      gate. Same toolchain. Same guards from scripts/mainnet/deploy.ts.
      Artifact 365d retention.

- package.json: new "mainnet:deploy" script.

- docs/operations/cd-pipeline.md (NEW) — topology, one-time setup
  (devnet keypair + mainnet environment + 5 secrets), first-deploy
  vs upgrade matrix, rehearsal protocol (3× clean on devnet).

- docs/operations/mainnet-canary-plan.md §3.3 — CD pipeline checkbox
  updated with reference to cd-pipeline.md and rehearsal-pending
  note.

Tracker (docs/security/internal-audit-findings.md):
- SEV-046 row added (Medium, Closed). Counts: Medium 12→13, Total
  46→47.
- Cleanup: SEV-045 "_this PR_" → "[#387]" now that #387 merged.
- Prose: Pass-12 wave description appended.

Methodology: same scaffolding pattern as PR #381 (Squads ceremony,
Immunefi package, observability spec) — workflow files + script +
docs sandbox-completable, but actual execution (running the
workflows, rehearsing 3× clean, exercising mainnet approval gate)
is gated on operator + GitHub repo Settings + funded keypairs.

https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants