feat(toolchain): Agave 2.x migration (#230) — BLOCKED on upstream mpl-core#319
feat(toolchain): Agave 2.x migration (#230) — BLOCKED on upstream mpl-core#319alrimarleskovar wants to merge 8 commits into
Conversation
…-core 0.8→0.10) WIP — first iteration of #230 Agave 2.x migration. Expect breakage. ## Round 1 changes - `anchor-lang` / `anchor-spl`: 0.30.1 → 0.31.1 (all 4 programs) - `mpl-core`: 0.8.0 → 0.10.0 (roundfi-core) - CI anchor lane: install anchor v0.30.1 → v0.31.1 - CI anchor lane: drop `anchor build --no-idl` workaround (Anchor 0.31 fixed the `Span::source_file()` removal, so full IDL build works) ## What's NOT in this round (deliberately) - Cargo.lock regeneration (let local `anchor build` do it) - borsh explicit bump (anchor-lang 0.31 pulls it transitively) - Drop `|| true` from cargo-audit + cargo-deny (do at END, only after Rust-side compiles green) - Drop `--locked false` workaround in anchor-cli install (might still be needed; we'll learn from iteration) - docs/verified-build.md rewrite (waits until toolchain is locked) - AUDIT_SCOPE.md + audit-readiness.md language updates (same) ## Iteration plan (next sessions) 1. Operator runs `anchor build` locally with Agave 2.x toolchain 2. Paste errors → we fix code (mpl-core API drift in `programs/roundfi-core/src/instructions/{join_pool,escape_valve_*}.rs` is the most likely breaker) 3. Repeat until `anchor build` clean 4. Run `cargo audit` + `cargo deny check` — confirm transients are gone 5. Flip cargo-audit + cargo-deny lanes from advisory to required 6. Rewrite docs/verified-build.md troubleshooting 7. Update AUDIT_SCOPE.md + audit-readiness.md language 8. Devnet redeploy + OtterSec attestation refresh (operator action) Tracking: #230.
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
Round 1 attempt at 0.10 failed because mpl-core 0.10's kaigan 0.2.6 pins anchor-lang ^0.30.0 which transitively pulls solana-program 1.16 — curve25519-dalek zeroize conflict against Anchor 0.31's resolver. cargo search shows mpl-core 0.12.0 is the current crates.io version. Likely solves the kaigan/anchor pin chain. Round 2 iteration.
Anchor 0.31 reworked the IDL build path: types used in #[account] structs now must implement `anchor_lang::IdlBuild` (replaces the 0.30 Span::source_file() reflection path). Default methods (None/empty) satisfy the trait; the IDL emits Payload as an opaque type — matches the actual on-chain layout (fixed 96-byte buffer with no field decomposition) and matches the previous behaviour under the old --no-idl workaround. Gated on idl-build feature so non-IDL builds don't carry the impl.
mpl-core 0.12 has `default = [borsh-v1]` which transitively enables `kaigan/borsh-v1`. Our `features = ["anchor"]` adds `kaigan/anchor` on top — and kaigan's lib.rs has a `compile_error!` rejecting both features simultaneously (anchor 0.30 uses borsh 0.10 internally, borsh-v1 is for newer ecosystem). Adding `default-features = false` keeps only the `anchor` feature active, resolving kaigan's mutual-exclusion check.
cargo search reveals anchor-lang 1.0.2 exists — major version that fully migrates to borsh 1.x (closes the borsh-0.10-vs-1.x conflict between mpl-core 0.12's direct dep and Anchor 0.31's transitive solana_pubkey dep that blocked round 4). If 1.0 isn't a free upgrade from 0.31 (API breakers in derive macros, attribute syntax changes, etc.), the round 5 errors will surface them. Round 5 iteration.
Anchor 1.0 migrated to borsh 1.x which removed the BorshSerialize::try_to_vec() convenience method. Replaced with the free-standing borsh::to_vec() helper — same semantics, just a different surface. Round 6 of #230 iteration.
Tentative fix for the maybestd errors blocking the Agave 2.x build. Forces all borsh requests across the workspace to resolve to 1.5.x, removing the dual-version conflict between mpl-core 0.12 (borsh 1.x) and anchor's transitive solana-pubkey (borsh 0.10). Low-probability fix — `maybestd` is a borsh 0.10-only module so any crate that actually depends on it will fail to compile against 1.5. If that happens, this commit gets reverted and #319 stays draft pending upstream mpl-core ↔ anchor compat. https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
Patch.crates-io requires a different source than crates-io itself (cargo error: "patches must point to different sources"), so a pure version-override unification is not expressible. The only path that mechanism allows is patching to a github fork of borsh — more risk surface than the bug we're trying to dodge. Reverting cleanly. The upstream block (mpl-core 0.12 + anchor 1.0 share-conflicting borsh transitives) is the real wall here; #319 stays in draft pending upstream resolution. https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
The harvest path was shipped (real `redeem_reserve_collateral` CPI as redeem-all + redeposit-principal round-trip) and #233 closed, but several places in the doc set still framed it as "stub returning realized=0" or "post-audit". This sweep aligns the docs with the code: - MAINNET_READINESS.md 4.5 — flipped from 🟡⛔ to ✅ with the real implementation summary; dropped #233 from the 4.1 canary gate list; rewrote § 7 critical-path summary to surface the Agave 2.x toolchain (#319) as the live blocker and remove harvest from the to-do list. - docs/security/audit-readiness.md "pitch vs shipped" row — harvest is shipped, not staged. - docs/security/mev-front-running.md § 2.4 + § 3 summary table — harvest is live; sandwich vector is bounded (slippage guard + `PrincipalLoss` revert), Jito bundling demoted to operator concern. - docs/operations/mainnet-canary-plan.md — canary yield branch is unconditional (not gated on #233). - README.md — Yield Waterfall paragraph reflects both deposit and harvest paths shipped. No code changes. Pure doc-debt sweep so the auditor reads a register that matches the codebase on day one. https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
Doc-debt sweep: aligns MAINNET_READINESS.md, README.md, audit-readiness.md, mev-front-running.md, and mainnet-canary-plan.md with the codebase after the Kamino harvest path landed and closed #233. - MAINNET_READINESS.md § 4.5 flipped 🟡⛔ → ✅; § 4.1 canary gate list dropped #233 and annotated #230 with PR #319 upstream-blocked status; § 7 critical path reworked to surface #319 as the live blocker. - docs/security/audit-readiness.md "pitch vs shipped" row — both deposit + harvest paths now in scope. - docs/security/mev-front-running.md § 2.4 — Kamino sandwich vector reframed as bounded (slippage guard + PrincipalLoss revert); summary table merged with #322's cooldown row. - docs/operations/mainnet-canary-plan.md — yield branch unconditional (was gated on #233). - README.md Yield Waterfall paragraph reflects both paths shipped. No code changes.
|
Upstream issue filed: metaplex-foundation/mpl-core#282 Tracks the Generated by Claude Code |
…ase B) (#324) Fase B of the Squads ceremony preparation. Builds on #323 by providing operator wrappers around the new on-chain authority-rotation ix trio plus a fill-in-the-blank rehearsal log so the devnet dry-run produces a reviewable artifact. Scripts (all under scripts/devnet/, manual encoding — no Anchor SDK runtime since IDL gen is still blocked by #319): - squads-rehearsal-verify.ts: read-only ProtocolConfig inspector; decodes authority-rotation surface + classifies state (Idle/Pending/Commit-ready); runs between every step. - squads-rehearsal-propose-authority.ts --new-authority <pk>: submits propose_new_authority. - squads-rehearsal-cancel-authority.ts: submits cancel_new_authority. - squads-rehearsal-commit-authority.ts: submits commit_new_authority with pre-flight eta check. All four refuse to run against mainnet-beta. Rehearsal log template at docs/operations/rehearsal-logs/TEMPLATE-squads-rotation.md — 9 sections covering metadata, member set, PDA cross-check, 4 program upgrade auth rotations, propose/wait/cancel/re-propose/commit, optional treasury, verification matrix, surprises, sign-off. Procedure doc sync: companion list adds the 4 scripts + template; Devnet rehearsal section reworked with explicit 6-step sequence + script mapping table + timelock fast-forward callout.
… clarify + bans tolerance CI lane failure on PR #333: `deny · supply-chain (required)` was failing because cargo-deny treats workspace path deps as wildcards unless `publish = false` is set on each member crate. Also failed license detection for mpl-core 0.8.1 (no `license` field in the crate's Cargo.toml, only a LICENSE file). Three changes to make cargo-deny pass against the actual dep tree: 1. **Cargo.toml** — added `publish = false` to [workspace.package] and `publish.workspace = true` to each of the 5 member crates. These are Solana programs + math lib, never destined for crates.io. Declares intent + unlocks cargo-deny's `allow-wildcard-paths` honoring. 2. **deny.toml [bans]** — added `allow-wildcard-paths = true`. With `publish = false` set on the workspace, cargo-deny now accepts intra-workspace `path = "../..."` deps without flagging them as wildcards. 3. **deny.toml [[licenses.clarify]]** — added a clarify entry for `mpl-core 0.8.1`. The crate ships a LICENSE file (Apache-2.0) but doesn't declare the license field in Cargo.toml. cargo-deny's text-fingerprint heuristic gets 0.80 confidence — below the 0.93 threshold — so the clarify entry pins the resolution explicitly. Retires when we upgrade to mpl-core 0.12 (which declares its license field properly) via the Agave 2.x migration (#319). Validation ========== $ cargo generate-lockfile # CI does this $ cargo deny check --hide-inclusion-graph advisories ok, bans ok, licenses ok, sources ok ✓ $ cargo audit --deny warnings --ignore <11 IDs> exit=0 ✓ $ pnpm lint # green $ cargo check -p roundfi-core # green All four cargo-deny gates pass. Cargo.toml additions are pure metadata — no semantic impact on the build. Continues SEV-011 remediation; this commit is the test-against-fresh- lockfile follow-up that the original ci.yml flip missed.
…-011 Medium) (#333) Closes Adevar Labs SEV-011 (Medium). cargo-audit and cargo-deny lanes were both shipped with `|| true` ("advisory-only"). Result: green CI even when NEW advisories landed against non-Solana deps. Falsa sensação de segurança. Flipped both lanes from advisory to required semantics (lane names retained as "(advisory)" for branch-protection compat — the protection rule pins the exact check name; will flip the name when the protection rule updates). cargo audit: --deny warnings + 11 explicit --ignore RUSTSEC-IDs for known Solana 1.18 / mpl-core 0.8 transients. Each ignore tied to a retire condition (Agave 2.x migration #319). cargo deny: --hide-inclusion-graph without `|| true`. Same RUSTSEC-IDs pinned in deny.toml [advisories].ignore for defense-in-depth. Two follow-up commits: - Added publish = false to [workspace.package] + publish.workspace = true per crate; allow-wildcard-paths = true to [bans]; [[licenses.clarify]] for mpl-core 0.8.1 (ships LICENSE but no Cargo.toml license field, scores 0.80 on heuristic — pinned explicitly). - Reverted lane name change to keep branch-protection rule matching. When Agave 2.x migration lands (#319), the bulk of ignored RUSTSEC-IDs retire automatically + the mpl-core clarify can drop. Any NEW advisory in non-Solana deps now lands as a red CI check on the PR that introduced it. Validation: cargo audit + cargo deny + typecheck + lint green; CI 11/11 lanes green (incl. coverage + 6 fuzz targets).
… + broken-link cleanup (#374) - README §Stress Lab + docs/status.md: '40 L1 tests green' → '45 L1 tests green' (empirical: `pnpm test:economic-parity-l1` = 45 passing). - CONTRIBUTING.md §Validating a change: 'CI runs five gates' → '4 required gates + 2 advisory lanes' (matches docs/status.md canonical state); test:parity '(7 tests)' → '(11 tests)' empirical; test:economic-parity-l1 '(34 tests)' → '(45 tests)' empirical; added cargo audit + cargo deny advisory lanes. - CONTRIBUTING.md §Quick setup: Solana/Anchor comment clarified — explicitly states Agave 2.x migration is blocked upstream (#319 + docs/319-agave-2x-migration-spike.md), so the install commands intentionally pin to Solana 1.18.26 + Anchor 0.30.1. - CONTRIBUTING.md §What's in scope: path drift — 'packages/sdk' → 'sdk/', 'apps/' → 'app/' (canonical layout). - SECURITY.md §Audit status: replaced stale 'currently under internal audit ... External third-party audit is deferred to the mainnet migration phase' framing with current 'internal pre-audit complete (40 findings, 36 closed, Critical/High 10/10) — formal Adevar Labs engagement in scoping' framing. Honest-framing constraint preserved (NOT an Adevar attestation). - MAINNET_READINESS.md 1.6: removed broken link to retired `ADEVAR_AUDIT_REPORT.md` at commit 03f8030 (file no longer at HEAD; rolling history note kept for traceability). - MAINNET_READINESS.md 1.7: SEV PR range '#326..#365' → '#326..#372' (matches README §Development Status row 11 in wave 4). - MAINNET_READINESS.md 5.3: frontend-security-checklist 🔵 (planned) → ✅ (shipped — 10 threats T1-T10, 8 already-shipped items with file:line evidence, canary smoke verification checklist). - MAINNET_READINESS.md 5.5: removed stale 'Tracked under Item 4 of this readiness sweep' reference (Item 4 doesn't exist); rewrote to describe the actual DEMO-badge distinguisher plan. - MAINNET_READINESS.md 6.4: Civic gateway-token validator framing → Civic→Human Passport migration completed (#317 + #227), on-chain validator implemented but off-chain bridge service is roadmap; pointer to passport-bridge-threat-model.md activation checklist. Drift sweep against root-level *.md docs: AUDIT_SCOPE.md (wave 1), CHANGELOG.md (wave 1), README.md (wave 4), CODE_OF_CONDUCT.md (static, no drift), CONTRIBUTING.md + SECURITY.md + MAINNET_READINESS.md (this PR). Same Adevar-attribution + post-completion stat-reconciliation patterns waves 1-4 closed elsewhere. https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm Co-authored-by: Claude <noreply@anthropic.com>
…lity The Phase 1 Kamino spike spec was timing out at bankrun setup. Root cause: solana-program-test 1.18.0 (bundled with bankrun) only reads eBPF / SBFv1 arch (0xf7), but Metaplex has deployed a newer SBFv2 build (arch 0x107) of mpl_core to mainnet. `solana program dump` returns the current SBFv2 format; bankrun panics during program loading with a garbled name in the error message (the arch byte being mis-decoded as the start of the program name string). Same upstream-compat surface as SEV-012 (bankrun + mpl-core 0.8 / Anchor 0.31 borsh), just on a different external dep. Full unblock waits on the same toolchain bump (#319 / Agave 2.x). Workaround for the Phase 1 Kamino spike: opt out of loading mpl_core for specs that don't need it. Phase 1 only validates that bankrun can host Kamino's bytecode (which is still eBPF / SBFv1 — Kamino hasn't migrated yet) and doesn't touch Metaplex Core, so the opt-out is safe. Changes: - `BankrunSetupOptions.loadMplCore?: boolean` — defaults to `true` (back-compat with all existing specs that need Metaplex Core via `join_pool` / `escape_valve_buy`). - `setupBankrunEnv` destructures with sensible defaults (`loadMplCore = true, loadKaminoLend = false`). - Kamino spec passes `{ loadMplCore: false, loadKaminoLend: true }` with an explanatory comment pointing at the option docstring. Empirical evidence from the failing run that motivated this: $ file target/deploy/{klend,mpl_core}.so klend.so: ELF 64-bit LSB shared object, eBPF, ... ← compatible mpl_core.so: ELF 64-bit LSB shared object, *unknown arch 0x107*, ... ← SBFv2 Panic in bankrun setup: thread 'tokio-runtime-worker' panicked at solana-program-test-1.18.0/src/lib.rs:716 Program file data not available for <�h9 (CoREENxT6tW1HoK8...) Existing specs (`edge_grace_default*`, etc.) are not changed — they keep the default `loadMplCore: true` behavior. They're already in broken-on-bankrun territory pending #319 unblock; not making it worse. https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
* chore(kamino): bankrun-clone spike — Phase 1 (program loading)
Follow-up to SEV-040 closure: the discovery phase of this spike caught
the typo'd KAMINO_LEND_PROGRAM_ID constant before the spike ever ran.
This PR scaffolds Phase 1 — the smallest empirical step that validates
the remaining CPI-mechanics layer is testable in principle.
What this adds:
1. `tests/_harness/bankrun.ts`:
- Exports `KAMINO_LEND_PROGRAM_ID` (matches the on-chain pinned
constant post-SEV-040).
- `maybeLoadKaminoLend()` helper mirroring `maybeLoadMplCore` —
loads klend.so from target/deploy/ if present, prints a clear
dump command if missing.
- `setupBankrunEnv({ loadKaminoLend: true })` opt-in flag — only
specs that CPI into Kamino pay the load cost.
2. `tests/security_kamino_cpi.spec.ts` — Phase 1 spec:
- klend.so is present at target/deploy/klend.so
- Kamino Lend program account is registered in bankrun
- Account is marked executable (not data)
- Harness-side pin matches the canonical post-SEV-040 value
Skips with a clear pointer to the `solana program dump` command if
the .so is missing, so the spec is CI-safe (doesn't fail when
developer skips the spike).
3. `docs/operations/kamino-bankrun-spike.md` — runbook:
- Why bankrun-clone instead of devnet exercise (Kamino doesn't
publish a canonical devnet USDC reserve; init'ing one requires
Scope oracle infra that's also second-class on devnet).
- Phase 1 (this PR), Phase 2 (cascade-clone reserve state +
exercise deposit/harvest), Phase 3 (canary mainnet for economics).
- Accounts-to-clone inventory for Phase 2.
- Known risks that may force fallback to devnet (stale oracle
timestamps, cluster ID assertions, cascade depth, mutable state
divergence).
- Methodology lesson — the discovery-phase SEV-040 catch validates
the meta-pattern that "preparing operational validation is itself
a form of validation."
What this PR does NOT do:
- Phase 2 CPI exercise — requires cascade-cloning Kamino's reserve PDA
and nested dependencies (collateral mint, liquidity supply ATA, Scope
oracle accounts). Separate PR after Phase 1 proves bankrun accepts
the bytecode.
- klend.so download — operational pre-flight step run from the user's
local Solana CLI (sandbox doesn't have it).
- MAINNET_READINESS update — Item 4.5 stays 🟡 regardless of spike
outcome; canary mainnet is the canonical operational-validation event.
Validated: typecheck clean, prettier clean.
https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
* fix(bankrun): opt-out for mpl_core load — sidesteps SBFv2 incompatibility
The Phase 1 Kamino spike spec was timing out at bankrun setup. Root cause:
solana-program-test 1.18.0 (bundled with bankrun) only reads eBPF / SBFv1
arch (0xf7), but Metaplex has deployed a newer SBFv2 build (arch 0x107)
of mpl_core to mainnet. `solana program dump` returns the current
SBFv2 format; bankrun panics during program loading with a garbled name
in the error message (the arch byte being mis-decoded as the start of
the program name string).
Same upstream-compat surface as SEV-012 (bankrun + mpl-core 0.8 / Anchor
0.31 borsh), just on a different external dep. Full unblock waits on
the same toolchain bump (#319 / Agave 2.x).
Workaround for the Phase 1 Kamino spike: opt out of loading mpl_core for
specs that don't need it. Phase 1 only validates that bankrun can host
Kamino's bytecode (which is still eBPF / SBFv1 — Kamino hasn't migrated
yet) and doesn't touch Metaplex Core, so the opt-out is safe.
Changes:
- `BankrunSetupOptions.loadMplCore?: boolean` — defaults to `true`
(back-compat with all existing specs that need Metaplex Core via
`join_pool` / `escape_valve_buy`).
- `setupBankrunEnv` destructures with sensible defaults
(`loadMplCore = true, loadKaminoLend = false`).
- Kamino spec passes `{ loadMplCore: false, loadKaminoLend: true }`
with an explanatory comment pointing at the option docstring.
Empirical evidence from the failing run that motivated this:
$ file target/deploy/{klend,mpl_core}.so
klend.so: ELF 64-bit LSB shared object, eBPF, ... ← compatible
mpl_core.so: ELF 64-bit LSB shared object, *unknown arch 0x107*, ... ← SBFv2
Panic in bankrun setup:
thread 'tokio-runtime-worker' panicked at solana-program-test-1.18.0/src/lib.rs:716
Program file data not available for <�h9 (CoREENxT6tW1HoK8...)
Existing specs (`edge_grace_default*`, etc.) are not changed — they
keep the default `loadMplCore: true` behavior. They're already in
broken-on-bankrun territory pending #319 unblock; not making it worse.
https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
---------
Co-authored-by: Claude <noreply@anthropic.com>
CI run on PR #381 showed the lane failing with exit code 101 (likely the scripts/dev/rebuild-idls.sh step couldn't apply the anchor-syn #319 patch cleanly under CI's exact toolchain combo). Per the team's Phase 1 "freeze the core" directive — the disciplined move is to defer this CI work rather than block the v0.4-canary tag. The 3 non-CI deliverables in this PR (Squads ceremony template, bug bounty submission package, observability spec) are independent docs and stay. Re-attempt as a separate PR after the canary tag lands. Doc commentary in ci.yml preserves the intent + failure context for the follow-up. https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
…e + observability spec) (#381) * ops: 4 mainnet-prep deliverables (Squads + bug bounty + observability + bankrun CI) Closes 4 concrete items from team's mainnet-prep review that were realizable in sandbox (the rest are physical-ops / platform-account gated). Each deliverable is a deployable artifact, not a placeholder. 1. SQUADS CEREMONY TEMPLATE docs/operations/rehearsal-logs/FINAL-mainnet-squads-ceremony-template.md - Pre-filled report worksheet for the mainnet ceremony day - 6 phases (4 upgrade-authority rotations + 2 governance proposes) - Per-phase tx signature capture, coordinator + witness sign-off blocks - Hardware wallet attestation block + roll-back procedures - Appendices for hardening-check output, OtterSec PDAs, treasury ATA - Coordinator saves completed file as `YYYY-MM-DD-mainnet-squads-ceremony.md` 2. BUG BOUNTY SUBMISSION PACKAGE docs/security/immunefi-submission-package.md - Project overview + severity classification (Immunefi v2.3 mapping) - Asset list as JSON (Immunefi import-ready) with severity_caps per program - Out-of-scope, PoC reqs, Safe Harbor, KYC tiers, disclosure timeline - Initial $50k USDC pool funding tracker - 12-step submission checklist - Converts "we need bug bounty" → "send these to Immunefi" 3. OBSERVABILITY SPEC (3 files, ~22KB) docs/observability/README.md docs/observability/prometheus-alerts.yaml docs/observability/pagerduty-runbook.md docs/observability/grafana-dashboards.md - All 8 alerts team listed: config change, TVL cap, failed CPI, harvest revert, PrincipalLoss, vault mismatch, protocol pause, treasury changes (+ 1 supporting: indexer lag) - 4 Grafana dashboards from #271: indexer health, reconciler, RPC quorum/reorg, backfill cron — each importable JSON - PagerDuty escalation matrix (L1/L2/L3) + on-call rotation + 8 alert response procedures with decision trees - Pre-deployment readiness notes (indexer needs Prometheus exposition format + structured logs) - Stack choice docs (Grafana Cloud vs Datadog) with cost estimates 4. BANKRUN-NO-MPL-CORE CI LANE .github/workflows/ci.yml + package.json - New required CI lane that runs tests/security_kamino_cpi.spec.ts (15 tests covering Kamino spike Phase 1+2a+2b1+2b2+2b3) without needing mpl_core (uses loadMplCore: false opt-out) - Build path: anchor build --no-idl + rebuild-idls.sh (the #319 anchor-syn workaround) — proven works locally - Caches klend.so weekly (Github Actions cache, key=klend-so-2026-W20) - Partial unblock of SEV-012 / #319: the FULL bankrun lane stays blocked by mpl-core 0.8 Anchor 0.31 borsh, but the Kamino-spec subset that opts out of mpl_core now runs in CI - Tracks as required (`needs: anchor`) so a regression breaks CI What this PR does NOT do (still physical-ops gated): - Execute the Squads ceremony (needs hardware wallets + signers) - Submit to Immunefi (needs corp account + KYC + $50k USDC deposit) - Provision Grafana/Loki/PagerDuty (needs platform subscriptions) - Run canary mainnet (needs real $5 USDC + on-call coverage) - Resolve SEV-012 / #319 fully (waits on upstream mpl-core 0.12 Anchor 1.0 compat) This sprint converts ~70% of the team's "mainnet-prep deliverable" asks into ready-to-execute artifacts. Remaining ~30% needs humans + platforms outside the sandbox. https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm * revert(ci): defer bankrun-no-mpl-core lane to follow-up PR CI run on PR #381 showed the lane failing with exit code 101 (likely the scripts/dev/rebuild-idls.sh step couldn't apply the anchor-syn #319 patch cleanly under CI's exact toolchain combo). Per the team's Phase 1 "freeze the core" directive — the disciplined move is to defer this CI work rather than block the v0.4-canary tag. The 3 non-CI deliverables in this PR (Squads ceremony template, bug bounty submission package, observability spec) are independent docs and stay. Re-attempt as a separate PR after the canary tag lands. Doc commentary in ci.yml preserves the intent + failure context for the follow-up. https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm --------- Co-authored-by: Claude <noreply@anthropic.com>
PR #381 added this lane but it failed CI with exit code 101 and was reverted in commit 4b1fa12. Root cause traced to the Solana toolchain installer: PR #381 used the legacy `release.solana.com/v1.18.26/install` URL, but Anza's redirect chain returns non-zero on that path. The working `anchor · build` lane uses `anza-xyz/agave v3.0.0` from a direct GitHub releases tarball. This re-attempt mirrors the anchor lane's toolchain install verbatim (Agave 3.0.0 + anchor-cli via `cargo install --git --tag v0.30.1`). Combined with the prior commit fixing rebuild-idls.sh's program loop, the bankrun-no-mpl-core lane should now run cleanly: 1. anchor build --no-idl (.so for all 4 programs) 2. bash scripts/dev/rebuild-idls.sh (#319 patch + 4 IDLs) 3. cache klend.so (or download from mainnet) 4. pnpm test:bankrun:no-mpl-core (security_kamino_cpi.spec.ts) Local verification: rebuild-idls.sh produces all 4 IDLs from clean state, test:bankrun:no-mpl-core runs (1 passing + 16 pending — pending state is expected without klend.so on first CI run, then the cache warms on the actions/cache@v4 path with weekly key `klend-so-2026-W20`). SEV-012 status: 🟠 Blocked → 🟡 Partial. The mpl_core-dependent specs (join_pool + escape_valve_buy paths) stay local-only pending upstream mpl-core 0.12 ↔ Anchor 1.0 borsh compat (mpl-core#282 — issue stale since 2026-05-15 / no maintainer response in 3 days). When that unblocks, merge this lane back into the full anchor lane. Updates: - .github/workflows/ci.yml: new bankrun-no-mpl-core job (~120 lines) - docs/security/internal-audit-findings.md: SEV-012 row Blocked → Partial, summary table Medium-Blocked 1→0 / Medium-Partial 0→1, prose paragraph about mainnet-blocker status updated to reflect the partial unblock. - CHANGELOG.md: new [Unreleased] section. https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
Four commits bundled: 1. fix(SEV-012, partial) b1e6662 — rebuild-idls.sh: include roundfi_yield_kamino in program loop 2. ci(SEV-012, partial) f423cc5 — re-attempt bankrun-no-mpl-core lane with corrected toolchain install 3. docs(SEV-012, partial) 8aab24e — scope audit for the lane (other bankrun specs need mpl_core or unrelated rework) 4. docs(SEV-040) 8d18380 — root-cause post-mortem (new docs/security/post-mortems/ pattern) Empirically validated: the new bankrun-no-mpl-core lane completed in 3m 40s with all 17 Kamino spike tests running against cloned mainnet state. SEV-012 → Partial. PR #381 retrospective: failure was the legacy release.solana.com installer, not the anchor-syn #319 patch as previously thought. https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
… patch) (#390) Second empirical bug from SEV-046 rehearsal-1c. Both deploy scripts called `anchor build` without `--no-idl`, hitting anchor-syn 0.30.1's removed `proc_macro2::Span::source_file()` API. Same SEV-012/#319 issue. Existing CI lanes use `--no-idl` for this reason; the deploy scripts didn't. Fix consistent with mainnet-deploy workflow. IDLs aren't needed for the on-chain deploy — they're regenerated separately via `bash scripts/dev/rebuild-idls.sh` (local-only patch). https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
…pped it The bankrun and litesvm lanes both die at the combined "anchor build (--no-idl) + IDL regen (#319 workaround)" step, but the preceding "anchor · build" job — same toolchain, same Cargo.lock, same cache key — succeeds. The diff is rebuild-idls.sh, which runs patch-anchor-syn-319.sh, which does: find ~/.cargo/registry/src -name anchor-syn-0.30.1 -type d # exit 1 if empty Root cause is the interaction of two things that landed independently: 1. `Swatinem/rust-cache@v2` (unpinned to a minor) is now ≥ 2.7, which strips `~/.cargo/registry/src/` from the saved cache to halve its size; cargo is expected to re-extract from `registry/cache/*.crate` on demand when a dep is actually needed for a build. 2. `anchor build --no-idl` resolves the workspace with the `idl-build` feature OFF — anchor-syn is feature-gated behind `idl-build` — so cargo never has a reason to extract its source. The .crate file sits in `registry/cache/`, but `registry/src/anchor-syn-0.30.1/` doesn't exist when the patch script runs. That's why this passed on #416 (whose cache was saved by an older rust-cache that kept `src/`) and breaks on #417 (rebased on top of #416, warm cache from the v2.7+ run). Two complementary fixes — belt and suspenders: - `rebuild-idls.sh`: run a plain `cargo fetch` before invoking the patch script. `fetch` resolves the full lockfile and extracts every dep into `registry/src/` regardless of features. Wrapped in `|| true` so a network blip or `--locked` mismatch doesn't break the chain — the patch script's own fallback handles the residual case. - `patch-anchor-syn-319.sh`: if `find` returns nothing in `registry/src`, look for the `.crate` tarball in `registry/cache` and `tar -xzf` it into the matching `index-host/` directory under `registry/src`. This rescues the case where `cargo fetch` only populated `cache/` (some cargo versions are stingy with `src/`) and is also a clean recovery path on any future runner where the cache shape changes again. PR #417 changes 0 files in `programs/`, `scripts/dev/` (until this commit), `Cargo.toml`, or `.github/workflows/`. The fix is local to the two scripts and additive — the working path (anchor-syn already in src) is unchanged.
…ched anchor-syn The static.crates.io fallback worked on bb5df95 (download + extract + patch applied cleanly on a clean HOME locally), but CI still died at the same step with the same ~3-minute duration. The missing piece: when cargo extracts a .crate naturally it writes `.cargo-checksum.json` alongside the source — a JSON map of every file's SHA256 plus the .crate's own SHA at the top level. The .crate tarball does NOT ship that file, so `tar -xzf` leaves the extracted dir without one. Cargo treats a registry-origin dir without `.cargo-checksum.json` as invalid and re-fetches the .crate, clobbering our patch — that's why `anchor idl build` still hit the #319 crash even after my fallback ran. Fix: only when the direct-download path fires (NEEDS_CHECKSUM_GEN=1), generate `.cargo-checksum.json` after the patch is applied. Per-file hashes are computed from the patched on-disk content (so cargo's file-level verification, if any, sees a self-consistent dir); the top-level `package` SHA is the literal Cargo.lock checksum of anchor-syn-0.30.1 — `f99daacb...` — which cargo cross-checks against the lockfile. I confirmed locally that: - The .crate downloaded from static.crates.io has SHA256 exactly matching Cargo.lock's `f99daacb53b55cfd37ce14d6c9905929721137fd4c67bbab44a19802aecb622f`. - After patch + generation, `.cargo-checksum.json` lists 45 file entries and the matching package SHA — cargo's "is this a valid registry source" sniff test passes. End-to-end dry-run on `mktemp -d` HOME: download → extract → patch defined.rs → write checksum file → marker check ("SANDBOX PATCH" present) short-circuits on re-runs. The cache-hit + registry-hit paths from earlier commits are untouched and skip the checksum generation (they're already cargo-extracted dirs with the file present).
…al extraction The 047ca64 diagnostic showed exactly what's happening: /home/runner/.cargo/registry/src/ index.crates.io-1949cf8c6b5b557f/anchor-syn-0.30.1/ ← unpatched (cargo uses this) index.crates.io-6f17d22bba15001f/anchor-syn-0.30.1/ ← patched (cargo ignores) Cargo hashes the registry URL into the directory name. The sparse-registry URL (used by `anchor idl build` against the workspace) and whatever registry `cargo install --git anchor-cli` resolves through have DIFFERENT hashes. The previous patch script picked `head -1` of the find output — which on this runner is the cargo-install-created `6f17d22bba15001f` dir. `anchor idl build` then extracts a fresh copy to `1949cf8c6b5b557f` and compiles the unpatched code → E0599 at the `source_file()` call. Two complementary fixes: - `patch-anchor-syn-319.sh` now patches EVERY existing copy (`mapfile` + loop), not just the first one. Each loop iteration is idempotent (skip if "SANDBOX PATCH" marker already present, skip if defined.rs is missing). The checksum-file generation also iterates, only writing for copies that don't already have one. - `rebuild-idls.sh` primes the canonical sparse-registry extraction up front by attempting an `anchor idl build` that is EXPECTED to fail with the #319 error. The failure is irrelevant — the side effect is that cargo extracts anchor-syn-0.30.1 to `index.crates.io-<sparse-hash>/`, so the patch script then sees it and patches it before the real IDL build loop runs. Verified locally end-to-end against a fresh mktemp HOME: the patch script materializes via static.crates.io fallback, patches each found copy, and writes a self-consistent .cargo-checksum.json. Real CI will have multiple dirs and patch each one.
Adds /admin/ops/insights — 6th and last operational area — with ADR-0010 sample-size gates: 4 pre-defined questions (retention by level, default predictor, L1→L2→L3 progression, behavioral improvement), per-view N-thresholds, three-state classification (insufficient/preliminary/significant), Wilson 95% CI. Below threshold renders progress, never a number. Primitives in services/indexer/src/insights.ts (getInsights + retentionByLevel/defaultPredictor/progression/behavioralImprovement + INSIGHTS_THRESHOLDS + classifySample + wilson95Bps). GET /api/admin/insights behind requireAdmin (ADR 0009 §1). Page reuses Section/StatCard/Pill/RefreshBar/InfoTooltip + theme tokens — no visual invention. ~35 i18n keys per locale (PT+EN); admin_i18n parity 4/4. CI fix (issue #319 workaround): cargo's sparse-registry hashes the index URL into the extracted source dir name, so `cargo install --git anchor-cli` and `anchor idl build` extracted anchor-syn-0.30.1 to different `<index.crates.io>-<hash>/` dirs. patch-anchor-syn-319.sh now patches every extracted dir via mapfile+loop; rebuild-idls.sh primes extraction with a deliberately-failing `anchor idl build` so the canonical path exists before the patch loop runs. Zero changes to programs/ — no SBF bytecode, no devnet redeploy.
Spike: Path B is viable — borsh-version stalemate is escapableI ran the "Path B" experiment this PR description held in reserve and got an empirical answer: the borsh-coexistence problem is not actually a Metaplex-Anchor coordination problem — it's a feature-flag misconfiguration on our side. Pushed as a separate spike branch: What changed in the spikeFour -mpl-core = { version = "=0.8.0", features = ["anchor"] }
+mpl-core = { version = "0.12", default-features = false, features = ["borsh-v1"] }
-anchor-lang = { version = "0.30.1", features = ["init-if-needed"] }
+anchor-lang = { version = "1.0", features = ["init-if-needed"] }(applied across all 4 programs) Why this PR thought Path B was a fallback, and why it's actually the canonical pathThis PR's description had:
Two corrections from the spike:
Empirical evidence
The Remaining work to close #319 (per docs/319-path-b-spike.md §"What still has to happen")15 mechanical patches across 3 files, ~50-100 net LoC:
None touches protocol economics (no math, PDA seeds, or authority). What this spike does NOT prove
Recommended path forward
When all five land, #319 closes — without ever needing Metaplex to respond. This unblocks 4.1 (canary smoke) on Generated by Claude Code. Generated by Claude Code |
Builds on the Path B spike (969861d). All 4 programs now pass `cargo check --workspace --all-targets` with 0 errors against anchor-lang 1.0.2 + mpl-core 0.12 + solana-program 3.x. API migrations applied (mechanical, no economic-logic changes): 1. CpiContext::new / new_with_signer (15 sites in core, 2 in yield-kamino, 2 in yield-mock, 1 in reputation): first arg changed from program.to_account_info() to program.key() (Anchor 1.0 takes Pubkey). Struct-field token_program inside ATA Create{} left as AccountInfo. 2. Context<'_,'_,'_,'info,T> -> Context<'info,T> (Anchor 1.0 collapsed the 4 lifetimes to 1): harvest_yield, deposit_idle_to_yield handlers + lib.rs #[program] shims (core); harvest + kamino_cpi_redeem/deposit (kamino). 3. anchor_lang::solana_program::hash removed from the 1.0 shim: added a direct solana-program 3.0 dep to roundfi-core + roundfi-yield-kamino and import hash from there (reputation.rs, yield_adapter.rs, escape_valve_list_reveal.rs, kamino lib.rs). 4. sysvar::instructions::ID -> sysvar::instructions::id() (function form), 2 sites in yield-kamino. 5. ProfileSnapshot::try_to_vec() (borsh 0.10 inherent method, removed in borsh 1.x) -> AnchorSerialize::serialize into an owned Vec (get_profile.rs). 6. AccountInfo::realloc(len, zero_init) -> resize(len) (solana-account-info 3.x renamed it; zero-init implicit on growth) in migrate_reputation_config.rs. NOT yet validated (requires operator with SBF toolchain): - anchor build (SBF target) end-to-end - IDL regeneration for @roundfi/sdk - bankrun + litesvm suites - devnet redeploy + OtterSec attestation refresh See docs/319-path-b-spike.md for the full sequence + remaining steps. https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
Anchor 0.30 surfaced AnchorError names in the bankrun error log; Anchor 1.0's bankrun error object only carries 'custom program error: 0x<code>'. Three negative-path assertions matched on error NAMES only and broke. Added the corresponding hex codes (0x1773=PoolNotActive, 0x1777=WrongCycle, 0x177c=EscrowNothingToRelease) to the regex so they match both Anchor's old named form and the 1.0 code-only form. No program-logic change. https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
Anchor 1.0 keys the IDL account client by the exact struct name (escapeValveListing), camelCased. The test used the legacy loose alias '.account.listing' which is undefined under the 1.0 IDL, throwing "Cannot read properties of undefined (reading 'fetch')". Renamed to '.account.escapeValveListing' to match the on-chain struct (programs/roundfi-core/src/state/listing.rs: pub struct EscapeValveListing). https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
…hapes Closes the 3 state-shape-drift failures the CI lane comment flagged as a 'separate spec-refresh PR' (0x177c / 0x1777 / 0xbbd). None is a migration regression — git diff vs origin/main shows the setup was byte-identical; these tests had been failing pre-#319 and bankrun-full isn't a required CI lane. #5 release_escrow (0x177c EscrowNothingToRelease): The SEV-034 vesting derivation (compute_release_delta_target) computes total_paid = (stake_initial + total_escrow_deposited) - escrow_balance. The fixture seeded escrow_balance = 0, but real join_pool locks the stake (join_pool.rs:263: escrow_balance = stake). With escrow seeded 0 the math concluded the stake was already released -> delta 0 -> revert. Fix: seed member.escrow_balance, pool.escrow_balance, and the escrow vault token account to STAKE_INITIAL (2_500_000), modelling join_pool. Verified safe against the shared contribute/claim_payout tests: both use deltas, and claim_payout's seed-draw guard only gets easier. #6 settle_default (0x1777 WrongCycle): Guard is "args.cycle == pool.current_cycle" (settle_default.rs:161), as the working edge_grace_default tests use (cycle == CURRENT_CYCLE == 2). The spec passed the stale "current_cycle - 1" contract (cycle:1) against a pool seeded current_cycle=2. Fix: cycle 1 -> 2 + corrected doc-header. #7 deposit_idle_to_yield (0xbbd AccountNotEnoughKeys): yield-mock's Deposit needs its YieldVaultState PDA at position 5 (after the 4-account adapter prelude). buildDepositIdleToYieldIx only emits the 8 explicit core accounts (per its doc-comment); the adapter tail is the caller's job (sendDepositIdleToYield does it via remainingAccounts). Fix: append yieldStatePk as a remaining account in the test, mut + non-signer. https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
#6 advanced past WrongCycle (fixed last commit) to GracePeriodNotElapsed (0x1796 = 6038): the fixture's clock warp used the stale 60s devnet-patch grace value, but `anchor build` compiles WITHOUT the devnet-canary feature so the on-chain GRACE_PERIOD_SECS is the 604_800 (7d) production constant (constants.rs:62). The +70s warp never cleared the 7d deadline. Bumped SETTLE_GRACE_PERIOD_SECS 60 -> 604_800 to match the working edge_grace_default* specs, so setBankrunUnixTs clears the real deadline. https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
The anchor build / bankrun-no-mpl-core lanes installed anchor-cli v0.30.1, which can't parse anchor-lang 1.0 macros/IDL — so 'anchor · build' (a required check) failed on this PR even though the migration builds clean locally with anchor-cli 1.0.2. - All 3 'Install Anchor' steps: v0.30.1 -> v1.0.2. - Dropped the --no-idl flag + rebuild-idls.sh patch workaround: Anchor 1.0 builds IDLs natively (the anchor-syn Span::source_file() removal that forced --no-idl under 0.30.1 is fixed upstream in 1.0), so plain 'anchor build' emits target/idl/*.json for the bankrun harness. - Refreshed the stale lane-header comments. Advisory 'deny · supply-chain' may still flag the 1.18.x ignore list as stale (those advisories are now unreachable post-migration) — that's a non-blocking follow-up, tracked in deny.toml's #319 note. https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
lifecycle + economic_parity seeded cycle_duration at 60s / 3_600s, the stale pre-SEV-023 devnet values. The on-chain MIN_CYCLE_DURATION is 86_400 (constants.rs:152), so create_pool reverts InvalidCycleDuration (6033) and the whole describe cascades to undefined-fixture failures. Same state-shape drift class as the app_encoders_bankrun refresh — the cycle counter is driven by claim_payout (not Clock), so the 1-day floor adds no wall-time cost. edge_tiny_lifecycle + edge_degenerate_shapes were already on 86_400. https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
PR #434's anchor·build lane failed with: Error: Program ID mismatch detected for program 'roundfi_core': Keypair file has: 6rHgZF5YmJHgPSprx6NpZUNEyKipiwC1ffvxwdzDVq4v Source code has: 8LVrgxKwKwqjcdq7rUUwWY2zPNk8anpo2JsaR9jTQQjw Anchor 1.0 added a pre-build check that compares declare_id! against target/deploy/<program>-keypair.json. On a fresh CI runner those keypair files don't exist, so Anchor generates random ones and the check fails. The keypairs are operator-side deploy artifacts (gitignored) — they have no role in CI compile, where declare_id! is the authoritative source. The Anchor error message itself suggests --ignore-keys as the fix. Applied to all 3 'anchor build' invocations in ci.yml (anchor lane + bankrun-no-mpl-core lane + the third build block). https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
#2 in the localnet lifecycle spec hit ReputationLevelMismatch (join_pool.rs:172): the test asserted reputationLevel=2, but join_pool derives the trusted level from the on-chain ReputationProfile PDA (the Step-4d audit close-out). Fresh wallets have no profile, so the program treats them as level 1; asserting level 2 reverts. The fixture predates that hardening (it was written when join_pool trusted the client-supplied level). Fresh members ARE level 1 (50% stake), so LEVEL 2 -> 1 + LEVEL_STAKE_BPS 3_000 -> 5_000; STAKE_BASE follows automatically and the conservation assertions track. Refreshed the two stale doc comments. https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
🛑 BLOCKED on upstream — mpl-core ↔ Anchor borsh coexistence
This PR attempted the #230 Agave 2.x migration. After 7 iteration rounds (including bumping Anchor as far as 1.0.2 — the latest), hit a fundamental upstream blocker that can't be resolved at the Cargo.toml level.
What works (rounds 1–6)
anchor-lang/anchor-spl: 0.30.1 → 1.0.2 (latest stable; resolves cleanly)mpl-core: 0.10 → 0.12 (resolves the kaigan zeroize conflict)mpl-coredefault-features = false(resolves thekaigan/anchor+kaigan/borsh-v1mutex)PayloadIdlBuild impl (Anchor 0.31+ requirement for custom types in#[account])try_to_vec→borsh::to_vec()migration inget_profile.rs(borsh 1.x convention)v1.0.2,--no-idlworkaround dropped in CIWhat's blocked (rounds 4 + 7 — same error, post and pre Anchor 1.0 bump)
Building
roundfi-reputationsurfaces ~405 errors of the form:Root cause:
mpl-core 0.12usesborsh1.x directly. Anchor's deeper transitive layers (likely viasolana-program1.x →solana_pubkey) still expand code againstborsh0.10'smaybestdmodule — a no_std compat module that was removed inborsh1.x. Cargo unifies the crate name into two incompatible major versions in the same dep graph.Notably: bumping Anchor 0.31 → 1.0.2 (Anchor's own internal borsh 1.x migration) did NOT close this. Anchor 1.0 fixed it on Anchor's own surface, but
mpl-core 0.12still pulls/expects a different borsh version than whatsolana_pubkeycarries transitively. This is an mpl-core ↔ solana-program coordination problem, not an Anchor problem.Paths considered (and why rejected)
[patch.crates-io]borsh unification (round 7)patches must point to different sources. The only expressible form is patching to a github fork of borsh — more risk surface than the bug we're dodging.mpl-core/anchorfeature ("Path B")join_pool,escape_valve_list,escape_valve_buy. Frágil — when upstream lands, we'd have to choose between drift or undoing the refactor. Held as fallback if upstream stays cold for >60 days.Next steps (when unblocking)
Watch mpl-core releases. First version that supports Anchor 1.0 + borsh 1.x cleanly unblocks this PR. Track at https://github.com/metaplex-foundation/mpl-core/releases.
Upstream issue (to file manually — not in scope of bot's GH access):
When mpl-core releases compatible version: re-fetch this branch, bump the
mpl-coreversion, retry. Most iteration scar tissue is already in place — only the mpl-core version bump remains.Until then: all
cargo audit+cargo denylanes stay advisory. The "(advisory)" suffix on those CI jobs is honest about the unblocked-via-upstream state.Iteration trail (commits on this branch)
feat(toolchain): Agave 2.x migration — round 1fix(toolchain): bump mpl-core 0.10 → 0.12fix(attest): impl IdlBuild for Payload newtypefix(mpl-core): disable default featureskaigan/anchor+kaigan/borsh-v1mutexfix(anchor): bump 0.31.1 → 1.0.2fix(reputation): borsh 1.x — try_to_vec → borsh::to_vec helpertry_to_vec; use free-standing helperexperiment(toolchain): patch.crates-io to unify borshrevert(toolchain): drop borsh patch.crates-io experimentLeaving as draft PR so when upstream lands, we restart from this baseline (only the mpl-core version bump and a CI retry needed).
Generated by Claude Code