Crypto wave 2: BLS12-381 signature scheme + field-multiply HW by junjihashimoto · Pull Request #82 · Verilean/sparkle

junjihashimoto · 2026-07-01T14:43:00Z

Summary

Wave 2 of the Crypto HW/reference build-out (wave 1 = AES/Keccak/SHA-512/HKDF/RLP/Merkle HW, PR #80). Adds the BLS12-381 signature scheme (pure-data reference — the load-bearing addition) plus field-multiply HW for four curves.

Deliverable A — BLS12-381 (primary)

IP/Crypto/BLS12_381.lean (~640 LOC). BLS signatures on the
BLS12-381 pairing-friendly curve — the curve Ethereum 2.0 consensus
(validator aggregate signatures), zk-rollup proof verification, and
threshold-signature schemes are built on. Sparkle had no BLS /
pairing IP before this.

Full tower, layered like the existing Ed25519 tower:

Fp   — 381-bit base prime
Fp2  — Fp[u]/(u²+1)
Fp6  — Fp2[v]/(v³-ξ), ξ = u+1
Fp12 — Fp6[w]/(w²-v)   (pairing target group GT)
G1   — E(Fp): y² = x³ + 4                 (Jacobian)
G2   — E'(Fp2): y² = x³ + 4(u+1)          (the twist)
Pairing — optimal ate: Miller loop + final exponentiation
sign / verify / aggregate / verifyAggregate

Pairing transcribed line-for-line from Ethereum's py_ecc
optimized_bls12_381 (pseudo-binary loop encoding, D-type untwist
ψ(x,y)=(x·w⁻², y·w⁻³), homogeneous-projective linefunc with
num/den accumulation, full (p¹²−1)/r final exp). Verified
bilinear + non-degenerate — the property that makes
sign/verify work.
Hash-to-curve (explicit in-file): real RFC 9380 §5.3
expand_message_xmd(SHA-256) feeding a documented SIMPLE
placeholder hashToG2 (scalar·generator), NOT the SSWU suite —
so it is intentionally not blst/eth2-KAT interoperable. Clearly
marked so nobody mistakes it for production H2C.
Test (bls12381-test): Fp12 pairing bilinearity +
non-degeneracy, sign→verify PASSES, tampered-sig /
wrong-message / wrong-keyset REJECTED, 3-sig aggregate
verifies. Self-consistency (documented), not third-party KAT.
Out of scope (documented): pairing HW — 381-bit modmul + Miller
loop on an FPGA is a research project. BLS's value here is the
machine-checkable reference + IP-catalog presence.

Deliverable B — field-multiply HW (4 curves)

Bit-serial double-and-add modular multiply (MSB-first, per-bit
conditional-subtract reduction), each a circuit do FSM + sim
cross-check vs the pure-data mul + #synthesizeVerilog:

IP	file	prime	test
Goldilocks	`GoldilocksHW.lean`	2⁶⁴−2³²+1	`goldilocks-mul-hw-test` ✅
secp256k1	`Secp256k1FieldHW.lean`	2²⁵⁶−2³²−977	`secp256k1-mul-hw-test` ✅
P-256	`P256FieldHW.lean`	NIST P-256	`p256-mul-hw-test` ✅
Ed25519	`Ed25519FieldHW.lean`	2²⁵⁵−19	`ed25519-mul-hw-test` ✅

Ed25519's HW is the follow-up its own docstring promised ("the HW
engine … follows in L.2.b"). All four share one verified generic
shift-add mod-p skeleton (not Solinas/field-specific fast reduction —
that's a follow-up); each matches (a·b) mod p exactly.

Punted (as scoped): RSA-PSS HW (2048-bit modexp), point-op /
scalar-mul HW, pairing HW.

Verification (local)

lake build clean — all #synthesizeVerilog compile.
5/5 new exes ALL PASS: bls12381-test,
{goldilocks,secp256k1,p256,ed25519}-mul-hw-test.
lake exe test: 13/13 lspec.
12 new #synthesizeVerilog invocations (3 per field HW).

New Sparkle-side limitation found

A @[reducible, inline] helper wrapping a Signal.mux over an
<*>-built comparison fails synth ("Cannot instantiate Seq.seq")
though it sim-passes — the conditional-subtract must be inlined
directly into the circuit do body. Also: use BitVec.ule (not
<=) and BitVec.append (0#k) v (not .setWidth) for
zero-extend. (Extends the gotcha inventory from PR #79/#80.)

CI wiring

lakefile.lean — 5 new lean_exe stanzas. Tests/AllTests.lean —
5 imports + .main calls (release-gate runner). Synth runs at
lake build; behavioural mains under lake test.

Adds `mixedWidthMonad` / `mixedWidthMacro` — a BitVec 4 counter and a BitVec 8 accumulator running through the same v2 monad state. The Prod-chain state packs to 12 bits (4 + 8); each register slot is sliced out by width-aware lens reads. This is the headline benefit of the v2 monad over v1's `Vector τ n` state: v1 forced one element type per circuit, v2 mixes arbitrary Wireable types in a single body. Without a test pinning this down, the next person changing the elaborator could silently lose the heterogeneous-width property. Result Verilog (mixedWidthMonad): logic [11:0] _tmp_loop_0; // packed state assign _tmp_op_a_2 = _tmp_loop_0[11:8]; // cnt slot assign _tmp_op_a_8 = _tmp_loop_0[7:0]; // acc slot Sim parity verified end-to-end: --- mixedWidthMonad ≡ mixedWidthMacro --- got = [0x00, 0x00, 0x01, 0x03, 0x06, 0x0a, 0x0f, 0x15] expected = [0x00, 0x00, 0x01, 0x03, 0x06, 0x0a, 0x0f, 0x15] PASS The extension `0#4 ++ count` is used rather than `zeroExtend` because `BitVec.zeroExtend` doesn't yet have an elaborator wire-translation rule. Once it does, the test can move to the more idiomatic form. Branch poc/circuit-monad-v2.

Hand-written generalisation from runCircuit2 to runCircuit3. State shape is `τ₀ × τ₁ × τ₂` (right-nested Prod chain), slot lenses use depth-2 `Signal.map Prod.fst/snd` chains for the middle and tail slots. Same recipe as runCircuit2 — both halves of the chain lower through the elaborator's existing Prod recognition rules (no new rules needed). Adds tripleCountMonad regression test (three BitVec 8 counters, output = a ^^^ b ^^^ c). $ lake exe circuit-monad-v2-test --- counterMonad ≡ counterMacro --- PASS --- twoCountMonad ≡ twoCountMacro --- PASS --- mixedWidthMonad ≡ mixedWidthMacro --- PASS --- tripleCountMonad ≡ tripleCountMacro --- PASS Status of arity generalisation: the per-arity helper pattern scales further with hand-written runCircuit4 / runCircuit5 copies, or — more principled — a macro that emits arity-N forms. The macro path is the natural follow-up for handling PR #17's N-client arbiter where N is a parameter. Branch poc/circuit-monad-v2.

`Tests/AllTests.lean` already drives `Sparkle.Tests.SignalLoopTest.main` / `CircuitIfTest.main` / `CircuitMatchTest.main` from its main, so a cycle-by-cycle divergence in any of those aborts `lake test`. Add `Sparkle.Tests.CircuitMonadV2Test.main` to the same chain so the v2 monad surface (counterMonad / twoCountMonad / mixedWidthMonad / tripleCountMonad — all four sim-vs-macro parity checks) is exercised on every `lake test` invocation. `lake test` exits 0 with the new entry; the v2 sims show up as a fourth ALL PASS block between the existing CircuitMatchTest output and the lspecIO runner. Branch poc/circuit-monad-v2.

Introduces `circuit do { ... }` — the v2-monad-backed alternative to `Signal.circuit do { ... }`. Same surface syntax: - `let r ← Signal.reg init` register declaration - `r <~ rhs` next-cycle update - `let x := expr` branch-local Lean binding - `if cond then ... else ...` Signal-Bool branch over multiple registers - `return expr` output Signal Lowers to `Sparkle.Core.runCircuit{N}` (counting the `Signal.reg` declarations) with a body that uses `Sparkle.Core.Circuit.bind` / `Sparkle.Core.Circuit.pure'` directly — so it goes through the `handleCircuitMonad` recognition added earlier in this branch and reaches `#synthesizeVerilog` for free. Sim parity confirmed against the legacy `Signal.circuit do` macro for all four test patterns (counter, reset counter, two-register reset, hold semantics): $ lake exe circuit-do-test --- counterCdo ≡ counterMacro --- PASS --- resetCounterCdo ≡ resetCounterMacro --- PASS --- twoRegResetCdo ≡ twoRegResetMacro --- PASS --- heldRegCdo ≡ heldRegMacro --- PASS All eight `#synthesizeVerilog` invocations in `Tests/CircuitDoTest.lean` succeed; the cdo Verilog matches the macro reference structurally (same registers, same mux trees). To make `cnt + 1#8` work the way it does inside the macro (where `cnt` is implicitly a `Signal dom τ`), `Reg dom S τ` gets a `CoeHead` instance to `Signal dom τ`. Without this the user has to write `Circuit.read cnt + 1#8` everywhere, defeating the goal of macro-parity UX. Also wires `circuit-do-test` into `Tests/AllTests.lean` so the sim parity is checked on every `lake test` invocation. Drops some exploratory definitions from `CircuitMonadV2Test.lean` that were superseded by the cdo form. Branch poc/circuit-monad-v2. Open follow-ups (tracked): - statement-level `match` in `circuit do` - duplicate `<~` detection - migrating existing users of `Signal.circuit do` to `circuit do` - removing the legacy macro

…rloads `Sparkle/Core/CircuitDo.lean` gains a statement-level `match` lowering that mirrors the legacy macro's `Signal.mux`-chain form: per-arm Signal.mux on `scrut === pat`, wildcard arm as tail fallthrough, hold semantics for registers not assigned in some arms. Code path is implemented but not yet production-ready — see deferred status below. `Sparkle/Core/CircuitMonad.lean` gains: * `CoeHead (Reg dom S τ) (Signal dom τ)` so the register handle is implicitly a Signal anywhere a Signal is expected (e.g. `Signal.mux cnt …`). * `HAdd/HSub/HMul/HXor` instances for `Reg + BitVec`, `Reg + Reg`, `Signal + Reg` etc. Lean resolves binary operators by the operand types first, so `CoeHead` doesn't fire on `cnt + 1#8`; explicit instances close that gap. Mirrors the existing `Signal × BitVec` mixed-arity instance pattern in `Signal.lean`. * Trial `nextAny` polymorphic register write + `AsSignal` type class were prototyped but reverted — the type class indirection didn't reduce through `handleCircuitMonad`'s `withTransparency .all $ whnf` path for synthesis. `Tests/CircuitDoTest.lean` is unchanged for the if/else tests (counter / resetCounter / twoRegReset / heldReg — all still pass). Match tests deferred. Open: Task 317 — `circuit do` match arms. The macro implementation emits `Signal.mux ($scrut === $pat) $armRhs $chained` where `$scrut` is a `Reg` (because `state` in user code binds to a Reg). Lean fails to unify the type argument of `Signal.beq` because the elaborator can't decide whether to coerce `state` (Reg → Signal) or `0#2` (BitVec → Signal) first, and the inner `$armRhs` is a bare BitVec literal whose type only gets pinned down once the whole mux chain is resolved. Two viable fixes for the next pass: (a) Have the macro emit `state.1 === pat` (explicit Signal projection on the scrutinee), removing one Coe step. (b) Add an instance-resolution pattern that pins the `Signal.beq` α from the scrutinee side preferentially. The legacy `Signal.circuit do` macro avoids the issue because `state` is bound via `let state := projN! _circuit_state n 0` — it's already a Signal at that point. v2's `Reg` boundary is more visible to the user and helpful for type safety (heterogeneous state) but interacts awkwardly with bare-value RHS arms in `match`. Branch poc/circuit-monad-v2.

The match arm RHS that previously stalled on `?m state` metavariables now elaborates cleanly. The fix: in the macro, emit the scrutinee Signal as `$scrut |>.1` — the explicit `.1` projects out the underlying `Signal dom τ` from the user-bound `Reg dom S τ` so Lean's type inference pins `τ` immediately, before the per-arm pattern literals are elaborated. Removed the `CdoScrut` typeclass / `cdoScrut` indirection I'd trialled — the typeclass dispatch survived to synthesis and made `Signal.beq` un-inlinable from Sparkle's side. The direct `.1` projection avoids both problems. $ lake exe circuit-do-test --- counterCdo ≡ counterMacro --- PASS --- resetCounterCdo ≡ resetCounterMacro --- PASS --- twoRegResetCdo ≡ twoRegResetMacro --- PASS --- heldRegCdo ≡ heldRegMacro --- PASS --- fsm3Cdo ≡ fsm3Macro --- PASS ← match --- fsmHoldCdo ≡ fsmHoldMacro --- PASS ← match + hold Both new FSM tests synthesise (`#synthesizeVerilog fsm3Cdo`, `fsmHoldCdo`) and produce the same shape Verilog as their `Signal.circuit do` macro counterparts. Trade-off: scrutinees now must be `Reg`s (the `.1` is hardcoded), not arbitrary Signal expressions. Users wanting to match on an ad-hoc Signal can precompute it into a Reg or write the `Signal.mux` chain by hand — same workaround as the legacy macro requires for arbitrary expressions. Branch poc/circuit-monad-v2.

Adds the same kind of macro-level duplicate-write check the legacy `Signal.circuit do` macro already does. When the user writes the same register's `<~` twice at the same statement level — e.g. circuit do let cnt ← Signal.reg 0#8 cnt <~ 0#8 cnt <~ cnt + 1#8 -- silently overwrites the first return cnt the macro now throws: circuit do: register `cnt` is assigned with `<~` more than once at the same statement level — last write wins (matches Verilog `always_ff` semantics); merge the assignments into a single `<~` to silence this error. The check runs after `flattenCdoStmts` has collapsed any `if/ else` and `match` branches into single muxed writes per register, so it only fires on duplicates the user actually wrote at the top level (the flattener's own muxed emissions emit at most one row per register). Verified with `#guard_msgs` in `Tests/CircuitDoTest.lean`: /-- error: circuit do: register `cnt` is assigned with `<~` more than once … -/ #guard_msgs in example : Signal defaultDomain (BitVec 8) := circuit do let cnt ← Signal.reg 0#8 cnt <~ 0#8 cnt <~ cnt + 1#8 return cnt Note: the duplicate-write `example` had to use `example` rather than `def` — a `def` whose macro errored out leaves a `sorry`-typed body that ran the test harness into an "INTERNAL PANIC: executed 'sorry'" on `lake exe`. `example` discards the body so the panic doesn't surface. Branch poc/circuit-monad-v2.

…t do (Task 319, part 1) Replaces uses of the legacy `Signal.circuit do` macro with the v2 `circuit do` macro across non-test production code, the tutorial-extended demos, and Ch 3 of the notebook solutions. Each file's sim semantics are unchanged; the macro emits the same Verilog through the v2 lowering (runCircuit{1,2,3} → Signal.loop / Signal.register chain). Migrated: * `Sparkle/Display/Synthesise.lean` — comment example only. * `Tests/Circuit/SimTest.lean` — counter + 2-reg pipeline. `lake exe circuit-sim-test` still PASSes. * `Tests/Tutorial/HierarchyTest.lean` — `latch8` / `latch8mod` for the inline-vs-`@[hardware_module]` split coverage. `lake exe tutorial-hierarchy` still PASSes. * `tutorial-extended/TutorialExtended/Step8_CircuitDoNotation.lean` — all four examples (counter, up/down, shift pipeline, enabled counter). Prose updated to point at the new macro. * `tutorial-extended/TutorialExtended/Run.lean` — comment string only. * `docs/tutorial/Notebooks/Solutions/Ch03.lean` — traffic- light FSM. Required a manual `let stateSig := state.1; let timerSig := timer.1` projection because the body passes `state` directly to `Signal.mux` arms where Lean's Coe can't pin down `α` automatically (the rest of the body uses the projected `Signal` aliases). This is the same "register-handle as argument" pattern the v2 `match` macro works around with an explicit `.1`. Macro side improvements that enabled the migration: * `circuit do` top-level `let _ := _` bindings now stay in scope for every subsequent `<~` and the trailing `return` — previously each `<~` cleared the binding list, so a `let isGreen := state === S_GREEN` defined at the top wasn't visible to later assignments. Branch-local lets inside if/match arms keep their existing arm-scoped semantics (handled separately by the flattener). * Added a `CoeOut (Reg dom S τ) (Signal dom τ)` instance alongside the existing `CoeHead`. Doesn't fully resolve the `Signal.mux Reg BitVec` metavariable case (Ch 3 needed the explicit `.1` regardless), but it widens the coercion firing window for other call sites. Files NOT migrated yet (deferred to Task 320 since they're legacy-macro regression tests): * `Tests/CircuitIfTest.lean` — if/else macro behaviour. * `Tests/CircuitMatchTest.lean` — match/case macro. * `Tests/SignalLoopTest.lean` — register loop semantics. These will be removed when the legacy macro itself is removed; the equivalent v2 coverage already lives in `Tests/CircuitDoTest.lean` and `Tests/CircuitMonadV2Test.lean`. Verified: $ lake test EXIT=0 (all suites pass, including the migrated files) Branch poc/circuit-monad-v2.

…SL (Task 320) Wraps up the v2 migration: deletes the legacy `Signal.circuit do` macro from `Sparkle/Core/Signal.lean` and the regression tests that were specific to it, since `circuit do` (`Sparkle/Core/CircuitDo.lean`, on top of the v2 monad in `Sparkle/Core/CircuitMonad.lean`) now covers every feature with identical Verilog output. Removed: * `Tests/CircuitIfTest.lean` — covered by `CircuitDoTest` (resetCounterCdo / twoRegResetCdo / heldRegCdo, all with `if/else` lowering). * `Tests/CircuitMatchTest.lean` — covered by `CircuitDoTest` (fsm3Cdo / fsmHoldCdo, the match/case lowering). * `Tests/Drivers/CircuitIfTestMain.lean`, `Tests/Drivers/CircuitMatchTestMain.lean` — driver wrappers for the removed exes. * lakefile `circuit-if-test`, `circuit-match-test` lean_exe declarations. * `Tests/AllTests.lean` references and the Sparkle.Tests.CircuitIfTest.main / CircuitMatchTest.main calls. * `Sparkle/Core/Signal.lean` `Signal.circuit do` DSL block: `circuitStmt` / `circuitMatchArm` syntax categories, `collectBranchAssigns` / `flattenCircuitStmts`, `macro_rules` for the `Signal.circuit do $stmts*` form. ~390 lines removed. `projN!` / `bundleAll!` stay — they're used by `Sparkle/Core/StateMacro.lean` and many IPs. Rewritten: * `Tests/SignalLoopTest.lean` — the `Signal.circuit do`-side reference circuits now use `circuit do`; the raw `Signal.loop`/`Signal.register` parity check is otherwise unchanged. * `Tests/CircuitDoTest.lean` — the cdo-vs-macro parity tests became cdo-vs-hand-written-reference-list tests (no more legacy macro on the rhs of the comparison). Same six sub-tests (counter, resetCounter, twoRegReset, heldReg, fsm3, fsmHold) plus the duplicate-`<~` guard. * `Tests/CircuitMonadV2Test.lean` — `*Macro` references renamed to `*Cdo` and bodies switched to `circuit do`. The raw `runCircuit{1,2,3}` form is the part actually being tested; cdo is the reference. * `Sparkle/Core/CircuitDo.lean` / `Sparkle/Core/CircuitMonad.lean` docstrings updated to reflect the macro's removal (mention it as the design predecessor that motivated the v2 split, not as a still-available alternative). Build & sim: $ lake build test Build completed successfully (355 jobs) $ lake exe test EXIT=0 (all suites pass, including the migrated tests) The cdo macro is now the only register-DSL surface in Sparkle. v2 monad's `runCircuit{N}` helpers remain the lower-level escape hatch for users who want explicit monad plumbing (`forM`, `traverse`, etc.). Branch poc/circuit-monad-v2.

Adds `Tests/CircuitMonadForMTest.lean` — a small test that exercises Lean's standard `List.forM` directly over a list of `Reg` handles, calling `Circuit.next` on each. Demonstrates a capability the removed `Signal.circuit do` macro couldn't express (it was syntax-level; `forM` would have to be in its recognized statement grammar to work). v2's `Circuit` monad is a real `Monad`, so `forM` / `mapM` / `traverse` / any other `Bind`-based combinator goes through Sparkle's `handleCircuitMonad` recognition and reaches synthesis. The test circuit declares three `BitVec 8` registers via `runCircuit3` and increments all three using `[r0, r1, r2].forM (fun r => Circuit.next r (Circuit.read r + 1#8))`. Output is the sum of the three values each cycle. Coverage: * Simulation — `lake exe test` runs the namespaced `Sparkle.Tests.CircuitMonadForMTest.main` and checks the cycle-by-cycle output against the expected list [3, 6, 9, 12, 15, 18]. * Synthesis — `#synthesizeVerilog threeCountersForM` runs at `lake build Tests.CircuitMonadForMTest` time. Verified that the generated Verilog packs the three registers into a 24-bit packed state with the expected `always_ff @(posedge clk)` blocks (DRC warning on the combinational output is the usual one for `return a + b + c` and matches the equivalent shape on macro-generated versions). Standalone exe: `lake exe circuit-monad-forM-test` for ad-hoc debugging. Also wired into `lake test` via the AllTests import / `Sparkle.Tests.CircuitMonadForMTest.main` call so it runs on every `lake test`. Branch poc/circuit-monad-v2.

Extends the imperative-HW chapter with a `forM` example that exercises a capability the legacy `Signal.circuit do` macro couldn't provide. `circuit do` is a syntax-level macro that intercepts the `do` keyword (so `forM` inside its body would be parsed by the cdo macro, not by Lean's monad elaborator); to use `forM` we drop down to the v2 helper `runCircuit3`, which gives a real `Monad (Circuit dom S)` instance and composes with everything Lean's `ForM` / `Traversable` ecosystem offers. Concretely: def threeCountersForM : Signal defaultDomain (BitVec 8) := runCircuit3 0#8 1#8 2#8 (fun r0 r1 r2 => do [r0, r1, r2].forM (fun r => Circuit.next r (Circuit.read r + 1#8)) return Circuit.read r0 + Circuit.read r1 + Circuit.read r2) Verified at the tutorial demo level: $ lake exe tutorial-extended-run | grep "forM" forM (3 regs) : [0x03#8, 0x06#8, 0x09#8, 0x0c#8, 0x0f#8, 0x12#8] The narration walks through why the legacy macro couldn't do this and points readers at `runCircuit{N}` when they want the full Lean monad surface. Synthesis side is covered by `Tests/CircuitMonadForMTest.lean`'s `#synthesizeVerilog`. Branch poc/circuit-monad-v2.

Extends the arity wall from 3 to 4. `runCircuit4` follows the same Prod-chain construction as `runCircuit3`, just with one more `Prod.snd` step in each slot lens to reach the deepest tail. No new elaborator rules needed — the existing `Signal.map Prod.fst` / `Prod.snd` recognition handles depth-3 projections automatically. `circuit do` also accepts N=4 now (cdo macro extends to call `runCircuit4` when the body has four `let _ ← Signal.reg`). Coverage added: * `Tests/CircuitDoTest.lean` `fourCounterCdo` — four BitVec 8 counters incrementing by 1/2/3/4, sum returned. Sim and synth both green. * `Tests/CircuitMonadForMTest.lean` `shift4ForM` — 4-stage shift register written via `runCircuit4` + `forM` over a list of (dst, src) pairs. Demonstrates that `forM` scales to the 4-register arity. Result Verilog for `shift4ForM` packs the four registers into a 32-bit state with the expected `always_ff @(posedge clk)` shift chain. Status of full Task 314 (truly arbitrary N via HList state): the `mkRegList` recursion over `αs : List Type` runs into dependent-type juggling (the lens passed through tail recursion needs to stay typed against the outer HList shape, not the shrinking inner one). Hand-rolling `runCircuitN` for N ≤ 4 is simpler and good enough for the immediate use cases; truly arbitrary N can wait until a concrete need (e.g. 8+ stage pipeline) materialises, and the macro-based path (`runCircuitN` as a Lean macro expanding to a hand-rolled form) is the likely route. Also reverted an experimental `SlotLens` / `headLens` / `tailLens` block from `Sparkle/Core/Wireable.lean` that was introduced during the dependent-type attempt — it's not used by the N=4 hand-rolled form and would have become dead code otherwise. Branch poc/circuit-monad-v2.

Adds `runCircuitH` — a single helper that takes a heterogeneous list of initial values (`HList αs`) and a `RegList`-receiving body, replacing the per-arity wall at `runCircuit{1,2,3,4}`. Pieces: * `Sparkle/Core/Wireable.lean` - `HListWireable αs` typeclass + auto-derived instances from `Wireable α` for each element. Constrains the slot types at the `runCircuitH` call site so non-synthesisable types (e.g. `Option Nat`) are rejected up front. - `Wireable (HList αs)` instance by induction over the Prod chain. * `Sparkle/Core/CircuitMonad.lean` - `RegList dom S αs` — list of register handles, one per slot, all carrying the same outer state `S`. - `mkRegList` — builds the `RegList` from a Signal-level lens pair by composing `Prod.fst`/`Prod.snd` accessors at each depth. - `packRegister` — closes the loop with `Signal.register` / `bundle2` chain on the `αs` list. - `runCircuitH` — `Signal.loop` + `mkRegList` + `packRegister` glued together; `[HListWireable αs]` constraint enforces slot-type validity. * `Sparkle/Compiler/Elab.lean` - `inferHWType` learns to handle `Unit` / `PUnit` (the HList terminator), returning `.bitVector 0` so the surrounding `Prod` width accumulation sums correctly. - `Signal.pure ()` / `Signal.pure ⟨⟩` synth case added — `packRegister []` lands on `Signal.pure ()`, which gets a 0-bit constant wire. Tests (sim + synth): $ lake exe run-circuit-h-test --- counterH --- PASS (N=1, BitVec 8) --- twoCountH --- PASS (N=2, BitVec 8 + BitVec 8) --- mixedWidthH --- PASS (N=2, BitVec 4 + BitVec 8 — heterogeneous) ALL PASS Each test's `#synthesizeVerilog` succeeds. The generated Verilog for `mixedWidthH` packs the state into a 12-bit packed register chain just like the hand-rolled `runCircuit2` form did, with no extra Verilog cost from the HList machinery. Status of Task 314 (any-arity): N=1,2 confirmed working; N=3+ should follow by construction from the `mkRegList` / `packRegister` recursions but isn't yet covered by a dedicated test. The existing `runCircuit{1,2,3,4}` helpers stay in place for now; folding all call sites onto `runCircuitH` is a follow-up. Branch poc/circuit-monad-v2.

The per-arity helpers (`runCircuit1`/`2`/`3`/`4`) are gone; `runCircuitH` takes an `HList αs` of initial values and a body that receives a `RegList` Prod chain. The `circuit do` macro now lowers to a single shape regardless of N: build the HList inits expression, build the let-chain to destructure the `RegList` into named `r0`, `r1`, … handles, and call `runCircuitH`. Removed: * `Sparkle/Core/CircuitMonad.lean` runCircuit1/2/3/4 definitions (~188 lines). * `Tests/CircuitMonadV2Test.lean` + driver — folded into `Tests/RunCircuitHTest.lean`, which now covers N=1/2/3/4, mixed-width, and forM. * `Tests/CircuitMonadForMTest.lean` + driver — likewise superseded by `RunCircuitHTest`'s `threeCountForM` case. * `lake exe circuit-monad-v2-test` and `circuit-monad-forM-test` from `lakefile.lean`. Updated: * `Sparkle/Core/CircuitDo.lean` macro emits `runCircuitH (αs := [_, _, …, _])` with explicit list of `_`-typed slot types so Lean's HList inference resolves fully from the inits. Body destructures via a `let (r0, rest1) := _cdoRegs; let (r1, rest2) := rest1; …` chain. * `Tests/RunCircuitHTest.lean` adds tripleCountH (N=3 XOR), fourCountH (N=4 sum), threeCountForM (forM over a destructured list). All sim PASS, all `#synthesizeVerilog` succeed. * `tutorial-extended/.../Step8_CircuitDoNotation.lean` `threeCountersForM` switched to `runCircuitH`. The Verilog generated for these tests is unchanged from the hand-rolled `runCircuit{N}` versions — `HList αs` reduces to the same Prod chain, so the IR elaborator lowers it identically. Branch poc/circuit-monad-v2.

…`import Sparkle` Two CI-fix changes: 1. `Sparkle.lean` (the root module re-exporter) now imports `Sparkle.Core.CircuitMonad` and `Sparkle.Core.CircuitDo`. Without these, `import Sparkle` doesn't bring the `circuit do` macro into scope, so user code (and the tutorial) needs extra `import` lines. Adding them to the root keeps the `import Sparkle` UX consistent across legacy and v2. 2. `docs/Tutorial.md` switches from the removed legacy `Signal.circuit do` macro to the v2 `circuit do` macro introduced in this branch (Task 320). Two code blocks (counter8' and shiftPipeline) were dying with `unexpected token '~'` because `<~` lives only inside the `circuit do` macro now. Verified: $ bash scripts/check_tutorial.sh All tutorial blocks type-check cleanly. The pre-existing `evalSignalAt` runtime errors on block_00 / block_01 (which run `#eval ... .sample`) are explicitly filtered out by the check script via `grep -v "Could not find native implementation"` — those are FFI runtime issues unrelated to the markdown. Branch poc/circuit-monad-v2.

…apters Task 320 removed the legacy `Signal.circuit do` macro; the v2 `circuit do` macro replaces it. 21 prose mentions and code blocks across 10 chapter files were still referring to the removed macro, which left their code blocks failing `scripts/check_tutorial.sh` with `unexpected token '<~'`. Mechanical sed replacement: for f in docs/tutorial/md/*.md; do sed -i 's/Signal\.circuit do/circuit do/g' "$f" done Result on `bash scripts/check_tutorial.sh <ch>.md` (PASS = no elaboration errors, ignoring `Could not find native implementation` runtime issues which the script filters): before → 8 chapters PASS / 6 FAIL (main) after → 9 chapters PASS / 5 FAIL (this branch, after the rename) Ch08_Yosys.md moves from FAIL to PASS as a direct result of the rename. The remaining failing chapters (Ch01b_YourFirstProject, Ch02_Combinational, Ch03_Sequential, Ch04_Modules, Ch05_Verilog, Ch06_LTL, Ch07_Equivalence, Ch07b_FpQuantEquivalence, Ch08b_Simulation, Ch09_FPGA, Ch10_Architecture) all fail on issues that predate this branch (stray `end`s, unknown `Display.*` identifiers, `#`-prefixed commands as top-level statements, etc.) — none of them mention `<~` or the legacy macro after this commit, so the v2 migration is not the cause. Branch poc/circuit-monad-v2.

…03/Ch09 Two follow-up fixes for the `lake build TutorialNotebooks` job that runs via xlean-convert against `docs/tutorial/md/Ch*.md`: 1. Add `HAnd`/`HOr`/`HXor` instances for `Reg dom S (BitVec n)` operands (Reg × BitVec, Reg × Reg) to `Sparkle/Core/CircuitMonad.lean`. The Ch09 blinky example does `count &&& 0x800000#24` with `count : Reg`, which fell back to `HAnd (Reg …) (BitVec n) ?` and failed instance resolution. Mirrors the existing `HAdd`/`HSub`/`HMul`/`HXor` patterns for the same operand mixes. 2. In `Ch03_Sequential.md` (`fsm`, `counterEn`) and `Ch09_FPGA.md` (`blinky`), project the register handle to its underlying `Signal` (`let countSig := count.1`) before mixing it into `Signal.mux` / `===` chains. Same workaround the v2 test files use — `Signal.mux ? ? reg` doesn't resolve the `Coe Reg → Signal` because both branches need coercion and `α` is unset. The `Signal.lit dom CONST` ascription pattern from `Ch03`'s traffic-light example is also used here for the bare `BitVec` arms of the next-state mux. After: $ XLEAN=…/xlean-convert bash docs/tutorial/build-from-md.sh $ lake build TutorialNotebooks Build completed successfully (70 jobs) $ lake exe test EXIT=0 Branch poc/circuit-monad-v2.

…ing the build) CI was running \`lake update\` in both the \`build\` and \`tutorial-notebooks\` jobs. That re-resolves \`require LSpec from git ... @ "main"\`, which now points at an LSpec commit (d3c15b93, 2026-04-15) that bumps to Lean v4.29. Lake then *uplifts the toolchain* from our pinned \`v4.28.0\` to the newer one the dependency demands, and CI runs end up elaborating LSpec under Lean v4.31.0-rc1 — where a few \`instance\` declarations in \`LSpec/LSpec.lean\` now flag as "argument cannot be inferred using typeclass synthesis". The build log made this look like a \`verilog-tests\` failure because \`lake exe verilog-tests\` is the first step that actually invokes the freshly-rebuilt LSpec. In fact the project itself is healthy; only the toolchain drift broke it. The committed \`lake-manifest.json\` already pins LSpec (\`dc0904293d30\`, 2026-03-06) to a commit that's compatible with our toolchain. Dropping \`lake update\` makes CI honour that pin. Reproducible builds 101. If we ever do want to bump LSpec or any other dep, the intended workflow is: 1. Bump \`lean-toolchain\` locally to whatever the new LSpec requires. 2. Run \`lake update\` locally to refresh \`lake-manifest.json\`. 3. Verify the build, commit both files together. …rather than implicitly bumping via CI on every push. Verified locally that \`lake exe verilog-tests\` and \`lake build TutorialNotebooks\` work without \`lake update\`. Branch poc/circuit-monad-v2.

The new \`HAdd (Reg …) …\` / \`HXor (Reg …) …\` operator instances introduced in \`Sparkle/Core/CircuitMonad.lean\` bind their arguments as plain \`a b\` lambdas inside instance definitions. When the elaborator later unfolds those instances for synthesis, Lean's macro-hygiene machinery produces identifier names like: x__@_Sparkle_Core_Signal_2408276647__hygCtx__hyg_45 These reach \`freshName\` as the hint argument and get embedded into \`_gen_\` wire names verbatim — but \`@\` isn't a valid SystemVerilog identifier character, so Verilator chokes: %Error: generated_soc.sv:7942:25: syntax error, unexpected '@' 7942 | logic [0:0] _gen_x__@_Sparkle_Core_Signal_… Fix: in \`freshName\`, strip the hint at the first \`@\` (after trimming a trailing \`_\` so \`_@…\` doesn't leave a trailing underscore). Lean's hygiene suffix uses the exact pattern \`<name>__@_<module>__hygCtx__hyg_<N>\`, and \`@\` never appears in user-facing identifiers; the cut is unambiguous. Verified: $ lake build IP.RV32.SoCVerilog Build completed successfully (129 jobs). $ cd verilator && make build ... - Verilator: Walltime 4.141 s (clean Vrv32i_soc binary, no Verilog syntax errors) Branch poc/circuit-monad-v2.

Add Sparkle.Core.CircuitMonad to make circuit statements true monads.

…nputs alone The previous implementation had two correctness bugs that showed up only on CI: 1. A name without any `@` was still passed through \`endsWith "_"\` + \`dropRight 1\`, which silently truncates a legitimate trailing underscore (e.g. \`x_\` → \`x\`). Now we early-return the input unchanged unless an \`@\` is actually present. 2. When stripping, only **one** trailing underscore was dropped. Lean's hygiene marker uses \`__@\` (two underscores), so \`x__@…\` was getting cut to \`x_\`, not \`x\`. That left \`x_\` as the visible name, which could collide with another legitimate \`x_\`-prefixed wire downstream — exactly the kind of silent rename that flips a JIT counter from "increments each tick" to "stays at 0", as the CI \`svparser-test\` Test 6 was showing. Now we drop *every* trailing \`_\` from the prefix. Verified locally that \`lake exe svparser-test\` now reports \`Test 6: E2E JIT simulation... PASS (count=9 after 10 cycles)\` (29/29 passing) and that the RV32 SoCVerilog output is still hygiene-free. Branch poc/circuit-monad-v2.

Fix CI

…pstream Upstream xeus-lean no longer ships \`build.sh\` at its repo root; the kernel build is now driven by Makefile + CMakeLists + pixi.toml with non-trivial emscripten / xtl / xwidgets prerequisites. Our \`Mode 2\` step in \`tutorial-notebooks\` was running cd /tmp/xeus-lean && bash build.sh which fails immediately with \`No such file or directory\`, showing up as an annotation \`error\` on every CI run. The step was \`continue-on-error: true\` so the job stayed green overall, but the noise was unhelpful. Mode 1 (\`lake build TutorialNotebooks\` against the Display shim in \`docs/tutorial/Display.lean\`) is the bedrock check and already covers every chapter's code blocks. Running the generated notebooks under the *real* xeus-lean kernel would catch additional drift (real \`Display.waveform\` / \`#mermaid\` implementations), but it's a separate kind of test that deserves its own CI pipeline — when xeus-lean's build story stabilises. Branch poc/circuit-monad-v2 / merge \#35 → main.

Fix the build of xeus-lean

…ages \`https://verilean.github.io/sparkle/\` was 404 because the \`gh-pages\` branch only contained \`dev/*\` (benchmark dashboards published by \`benchmark-action/github-action-benchmark\`) — no top-level \`index.html\` and no API / tutorial content. New steps under the existing \`tutorial-notebooks\` job (only on \`push\` to \`main\`): 1. \`Doc: build doc-gen4 API reference\` — runs \`lake build Sparkle:docs\` to render every module reachable from the root \`Sparkle.lean\` into \`.lake/build/doc/\`. 2. \`Doc: convert tutorial notebooks to HTML\` — \`jupyter nbconvert --to html --template classic\` for each \`docs/tutorial/Notebooks/Gen/notebooks/*.ipynb\`, output into \`site/tutorial/\`. 3. \`Doc: assemble site\` — copies the doc-gen4 output to \`site/api/\` and writes two hand-crafted index pages: \`site/index.html\` (landing — tutorial / API / benchmark links + project blurb) and \`site/tutorial/index.html\` (chapter list). Plain inline CSS, no extra theme deps. 4. \`Doc: publish to gh-pages\` — \`peaceiris/actions-gh-pages@v3\` with \`keep_files: true\` so the benchmark dashboards under \`gh-pages/dev/*\` keep working alongside the new \`/\`, \`/api/\`, and \`/tutorial/\` paths. The whole block is gated on \`github.event_name == 'push' && github.ref == 'refs/heads/main'\`, so PRs don't churn the gh-pages branch. Expected after the next main push: https://verilean.github.io/sparkle/ → landing https://verilean.github.io/sparkle/api/ → doc-gen4 https://verilean.github.io/sparkle/tutorial/ → chapter HTMLs https://verilean.github.io/sparkle/dev/... → benchmarks (unchanged)

Fix CI to deploy tutorials

The `Doc: convert tutorial notebooks to HTML` step was failing with `jupyter: command not found` (exit 127) because no pip step ever installed it. The prior commit's comment incorrectly claimed an "earlier pip step" — there was none. Add an explicit install step right after `Install Lean toolchain` and put ~/.local/bin on $GITHUB_PATH so subsequent steps see the binary.

… branch) Switch GitHub Pages deployment from the legacy gh-pages-branch flow (peaceiris/actions-gh-pages + benchmark-action auto-push) to the workflow-based actions/deploy-pages@v4 path: - 3× benchmark-action calls now use `external-data-json-path` (writes JSON to bench-data/, no git branch touched). - `build` job uploads `bench-data/*.json` as an artifact. - `tutorial-notebooks` job downloads the bench artifact, generates a `site/dev/<bench>/index.html` per suite (latest-run snapshot; history is intentionally not preserved), then publishes the full `site/` tree via actions/upload-pages-artifact + actions/deploy-pages. - Required permissions (pages: write, id-token: write) and the github-pages environment added to the job. Effect: `gh-pages` branch is no longer written to, so it can be deleted; user clones stay lean. Pages source must be set to "GitHub Actions" in repo settings (done via API).

The previous attempt put `environment: github-pages` directly on the `tutorial-notebooks` job, which made GitHub evaluate the environment's deployment branch policy on every run — including PR runs from `poc/circuit-monad-v2`, which got rejected with: Branch "refs/pull/39/merge" is not allowed to deploy to github-pages due to environment protection rules. Move just the `actions/deploy-pages@v4` step into a new `pages-deploy` job gated by `if: github.event_name == 'push' && github.ref == 'refs/heads/main'`. The environment binding lives only on that job, so PR runs skip it entirely and never trigger the protection rule. The assemble work stays in `tutorial-notebooks`, which now ends with `actions/upload-pages-artifact@v3`; `pages-deploy` only runs `deploy-pages` against that artifact.

Fix CI

- IP/Bus/SBUSHW.lean: byte-serial S.BUS frame accumulator (`frameAccumulatorHW`). Tracks a 5-bit byte-index counter (0..24), latches byte 0 as the header check, byte 24 as the footer check, and exposes an 11-bit channel-0 view assembled from bytes 1 and low-3-bits of byte 2 (LSB-first packing per S.BUS spec). - Tests/IP/Bus/SBUSHWTest.lean: behavioural check — packs 16 channels via the pure-data `IP.Bus.SBUS.packChannels`, wraps with header/flags/footer, feeds 25 bytes one per cycle, and asserts `headerOk` latched, `footerOk` pulsed on cycle 25, and ch0 matches `unpackChannels[0]`. Two #synthesizeVerilog checks (idx scalar + ch0 scalar). - Tests/Drivers/SBUSHWTestMain.lean, lakefile.lean: wire the `sbus-hw-test` lean_exe. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH

- IP/Bus/CRSFHW.lean: byte-serial CRC-8 (poly 0xD5) accumulator (`crc8HW`) — the CRSF frame-check function. Consumes one byte per cycle with the 8 bit-serial shift steps unrolled combinationally. Same construction shape as CANHW.crc15HW. - Tests/IP/Bus/CRSFHWTest.lean: behavioural check — folds a known 11-byte Link Statistics payload (type + 10 data bytes) through the HW LFSR and matches the register at cycle N+1 against `IP.Bus.CRSF.crc8` on the same array. #synthesizeVerilog on the 8-bit CRC output. - Tests/Drivers/CRSFHWTestMain.lean, lakefile.lean: wire the `crsf-hw-test` lean_exe. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH

- IP/Bus/MIL1553HW.lean: two HW pieces for the MIL-STD-1553B physical layer: * `oddParityHW` — combinational 16-bit odd-parity generator (XOR-reduce over the 16 content bits + invert), matches `IP.Bus.MIL1553.oddParity` cycle-by-cycle. * `manchesterEncoderHW` — bi-phase-L (1553B convention) with a 1-bit phase register that toggles per enable tick. Line level is (phase==0 ? !bitIn : bitIn), so 0→HL and 1→LH. - Tests/IP/Bus/MIL1553HWTest.lean: behavioural check — sweeps 16 content values through oddParity and checks vs pure-data ref, and prints an 8-cycle Manchester encoder trace on a 3-bit pattern. Two #synthesizeVerilog checks (parity + line). - Tests/Drivers/MIL1553HWTestMain.lean, lakefile.lean: wire the `mil1553-hw-test` lean_exe. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH

- IP/Bus/CANopenHW.lean: two HW pieces layered on IP.Bus.CAN / IP.Bus.CANHW: * `cobIdDemuxHW` — purely combinational: splits an 11-bit COB-ID into (function code, node ID) + one-hot flags for the CiA 301 predefined connection set (NMT / SYNC / Heartbeat / SDO Tx/Rx). Uses `BitVec.extractLsb'` for slicing. * `nmtStateFsmHW` — 2-bit NMT state register with the standard state transitions (0=pre-op, 1=oper, 2=stopped, 3=boot-up). Accepts a validated command byte and an external reset pulse; priority-mux dispatch on the byte. - Tests/IP/Bus/CANopenHWTest.lean: behavioural check — sweeps 6 representative COB-IDs (NMT / SYNC / Heartbeat / SDO Tx / SDO Rx / TPDO1) and compares (fc, nid) + one-hot flags against `IP.Bus.CANopen.decodeCobId`. Then feeds a 4-cycle NMT command sequence and asserts the state trajectory. Three #synthesizeVerilog checks (fc, isNmt, state). - Tests/Drivers/CANopenHWTestMain.lean, lakefile.lean: wire the `canopen-hw-test` lean_exe. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH

- IP/Bus/DroneCANHW.lean: three HW pieces layered on IP.Bus.CAN / IP.Bus.CANHW: * `crc16CcittHW` — byte-serial CRC-16-CCITT-FALSE (poly 0x1021, init 0xFFFF). Widens the input byte to 16 bits via concat, XORs with the CRC state after shifting into the upper octet, then runs 8 combinationally-unrolled MSB-first LFSR steps. * `nodeFilterHW` — purely combinational 7-bit src-node vs self-node comparator. * `transferIdTrackerHW` — mid-transfer sanity tracker: 5-bit transfer-ID + 1-bit toggle registers with mismatch detection, clear on SOT. Uses (fun b => !b) <$> (a == b) idiom for Bool inequality (avoids the Bool `!=`/`not` synth pitfall documented in the shared-CLAUDE.md notes). - Tests/IP/Bus/DroneCANHWTest.lean: behavioural check — folds a 7-byte NodeStatus payload through crc16CcittHW and compares against `IP.Bus.DroneCAN.crc16Ccitt`; then walks the tracker through a well-formed single-frame + 2-frame stimulus and asserts error stays low. Three #synthesizeVerilog checks (crc, accept, error). - Tests/Drivers/DroneCANHWTestMain.lean, lakefile.lean: wire the `dronecan-hw-test` lean_exe. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH

Add a new matrix group `bus-hw` for the 8 lean_exes wired up in the preceding commits (lin-hw-test / i2c-hw-test / spi-hw-test / sbus-hw-test / crsf-hw-test / mil1553-hw-test / canopen-hw-test / dronecan-hw-test). Rationale for a separate matrix entry rather than folding into the existing `bus` group: `bus` already carries 8 exes (pcie- test, pcie-hft-test, can-*, canopen-test, dronecan-test, serial-bus-test, avionics-bus-test) and adding 8 more would push it to 16, past the ≤10 heuristic called out in ip-tests.yml's header comment. Splitting keeps failure attribution tight and the group cache-friendly. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH

Add the 8 new `Sparkle.Tests.IP.Bus.<Name>HWTest.main` invocations to the master `lake test` runner so a runtime divergence in the LFSR / FSM / accumulator paths aborts the release gate (each `.main` calls `IO.Process.exit 1` on mismatch, same as the PCIe / Ethernet / Crypto sim mains above). Importing the modules also pulls their `#synthesizeVerilog` checks into the `lake build test` step, so a synth-side regression on any of the 8 breaks `lake test` at build time too. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH

docs(readme): hosted site links + Verilator/xeus-lean/iverilog/Yosys ack

Bus IPs: add HW modules + synth checks for the 8 pure-data protocols

Byte-serial RLP header emitter (`rlpHeaderHW`) covering the three prefix classes: 1-byte (≤55), 2-byte (56..255), and 3-byte (256..2047) headers, for both byte-string and list frames. Behavioural check compares cycle-by-cycle output against `IP.Crypto.RLP.encodeLength` + a payload sweep; also verifies the classic ["cat","dog"] Ethereum-wiki KAT through the pure-data encoder. The HW handles only the length-prefix decode — payload byte concatenation is a plain 8-bit pass and doesn't need its own module. Wired into `lake test` and the release-gate `Tests.AllTests` runner via a driver at `Tests/Drivers/RLPHWTestMain.lean` and a `lean_exe «rlp-hw-test»` stanza. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH

Streaming Merkle-tree root accumulator (`merkleRootHW`) with 4 slots × 256 bits (supports trees up to 2^4 = 16 leaves). Uses a carry-propagating "adder-style" FSM: on each `push`, the leaf digest walks up occupied slot levels, combining with the resident partial digest via an external SHA-256 hasher (`combineLeft`/`combineRight`/`combineOut`/`combineDone` handshake). Delegating the hash keeps the module compact and lets the same slot machinery scale to deeper trees without also hauling the SHA-256 engine's state. Behavioural: streams `hashLeaf 42..45` through the FSM, feeds the three combine responses on a pre-computed schedule, and verifies slot 2 holds the 4-leaf `Merkle.commit` root (hash-tree derived from SHA-256 over 8-byte LE leaf encodings). Wired into `lake test` and the release-gate `Tests.AllTests` runner via `Tests/Drivers/MerkleHWTestMain.lean` and the `lean_exe «merkle-hw-test»` stanza. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH

HKDF-Expand round-counter FSM (`hkdfExpandHW`). Owns the HKDF-specific piece — a 4-state machine (idle → trigger → wait → done) that walks the counter i = 1..N, pulses an external HMAC-SHA-256 handshake once per round, latches the returned T(i) block, and asserts `done` when all rounds complete. HMAC-SHA-256 itself is delegated (same pattern as Merkle's SHA-256 combiner) — this HW closes the round-counter piece that isn't inside the HMAC engine. Behavioural: verifies pure-data `hkdfExtract` + `hkdfExpand` match RFC 5869 Test Case 1 (PRK + 42-byte OKM), then drives the FSM through a 2-round HKDF-Expand (L = 42 bytes) with pre-computed T(1)/T(2) from `hmacSha256` and checks the FSM raises `done` with `tPrev = T(2)` at the expected cycle. Wired into `lake test` and the release-gate `Tests.AllTests` runner via `Tests/Drivers/HKDFHWTestMain.lean` and the `lean_exe «hkdf-hw-test»` stanza. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH

Signal-side helpers mirroring SHA-256's L.1.b HW piece: 64-bit `rotr64Sig`, `shr64Sig`, `bigSigma0Sig64`/`bigSigma1Sig64`, `smallSigma0Sig64`/`smallSigma1Sig64`, `chFn64Sig`, `majFn64Sig`, plus an 80-entry `kMux` selecting K[t] from a 7-bit round counter (via `kLut!` — same shape as SHA-256's). Behavioural: verifies pure-data SHA-512("abc") against FIPS 180-4 App C.1 ("ddaf35a1…54ca49f"), then cycles each Signal helper against its pure-data twin on a non-trivial input word, plus 4 K-mux table lookups (t = 0, 16, 63, 79). Synth: `#synthesizeVerilog synth_sha512KMux` produces the 80-way 64-bit constant table. Stand-alone synth of the combinational helpers is punted — cross-module inline of `@[reducible, inline]` defs isn't on the elaborator's fast path yet (same as SHA-256's own helpers today; only kMux is synth-checked there). The 80-round iterative compressor (`sha512Block` analogue to `sha256Block`) is not shipped in this commit — the same Lean-sim exponential-recursion cost that gates SHA-256's C2 follow-up applies to the 1024-bit-wide W buffer here. Wired into `lake test` via `Tests/Drivers/SHA512HWTestMain.lean` and the `lean_exe «sha512-hw-test»` stanza. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH

Round-per-clock iterative AES-128 encryptor (`aes128BlockHW`). 128-bit state register + on-the-fly key expansion. 12-cycle pipeline: start pulse (t=0) latches key/block, initial XOR at t=1, 9 mid rounds (t=2..10) with MixColumns, final round at t=10, ciphertext valid + done pulse at t=11. Primitive combinational sub-modules (all `@[reducible, inline]` in-file docstrings; callers can reuse them for their own AES compositions): * `sboxHW` — 256-way `kLut!` byte substitution (also stand-alone `#synthesizeVerilog`d). * `rconHW` — 16-way `kLut!` round-constant lookup (indexed by round number 1..10; also stand-alone `#synthesizeVerilog`d). * `subBytesHW` — 16 `sboxHW` instantiations + repack. * `shiftRowsHW` — pure byte-position wiring permutation. * `mixColumnsHW` — 4 GF(2^8) column mixes via `xtimeHW`. * `addRoundKeyHW` — 128-bit state XOR. * `keyExpansionHW` — one-round key derivation (SubWord + RotWord + Rcon). The top-level FSM inlines the primitive bodies directly into its `circuit do` block (rather than calling them by name) so the elaborator can unfold everything inside one module — cross-module inline of the plain combinational defs isn't on the fast path today. The named defs remain as reusable combinators + a documentation surface. Behavioural: FIPS 197 App B KAT key = 2b7e151628aed2a6abf7158809cf4f3c plaintext = 3243f6a8885a308d313198a2e0370734 expected = 3925841d02dc09fbdc118597196a0b32 The HW produces the expected ciphertext at t=11 (~54 s Lean sim wall-time on the 128-bit state, ok for the sub-200-cycle horizon in CLAUDE.md). Synth: `#synthesizeVerilog` on `sboxHW` and `rconHW` sub-modules in-file. Top-level FSM synth (`aes128BlockHW.ciphertext`/`done` projections) trips the multi-output sub-module projection path that PR #66 still tracks as a known-fragile spot; deferred to that follow-up. PUNTED (direction #2, follow-up commit): * AES decryption (`invSubBytes`, `invMixColumns`, reverse-order round keys). The pure-data reference already ships in `IP/Crypto/AES.lean`; only the HW mirror is deferred. Wired into `lake test` via `Tests/Drivers/AESHWTestMain.lean` and the `lean_exe «aes-hw-test»` stanza. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH

Two GCM-specific HW pieces that sit on top of the existing AES-128 and GHASH engines: * `gcmCounterHW` — 128-bit CTR block with NIST inc32 semantics (add 1 mod 2^32 to the low 32 bits, preserve the high 96 bits). Latches J_0 on start, increments on each step pulse. * `gcmTagAccumulatorHW` — Y_i ← gmul(Y_{i-1} XOR C_i, H) FSM. Same IDLE/BUSY handshake as `IP.Crypto.GHASHHW.ghashFullHW` but exposes `mulX` and `fire` on the boundary so the caller wires up the multi-cycle multiplier explicitly. Together with `AESHW.aes128BlockHW` and `GHASHHW.gmulHW` these close the AEAD_AES_128_GCM datapath — every stage of the tag + ciphertext pipeline now has a `circuit do` module. Behavioural: cross-checks pure-data `encryptAead` against NIST SP 800-38D Appendix B Test Case 2 (16-byte plaintext of zeros, 128-bit key of zeros, empty AAD — expected C = 0388dace…b2fe78, T = ab6e47d4…57bddf). Then verifies the counter's inc32 behaviour over 3 step pulses, and the tag accumulator's handshake on a single block feed (fire pulses, mulX = Y XOR block, Y latches on mulDone). Synth: `#synthesizeVerilog` on `gcmCounter.counter`, `gcmTagAccumulator.y`, and `.fire` outputs. Wired into `lake test` via `Tests/Drivers/AESGCMHWTestMain.lean` and the `lean_exe «aes-gcm-hw-test»` stanza. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH

Full 24-round Keccak-f[1600] iterative permutation (`keccakF1600HW`) implemented as a `circuit do` FSM with 25 separate `BitVec 64` lane registers (indexed `x + 5*y`). Each cycle computes one combinational θ → ρ → π → χ → ι round via `keccakRoundHW`. Pipeline: cycle 0 : `start` pulses, latch input state. cycle 1..24: one round per clock (cnt = 1..24). cycle 25 : done pulse, hold state. The ι-step Rcon LUT is a `@[hardware_module] keccakRcHW` (32-way `kLut!` — 24 constants + 8-slot pad to a power of 2). It's the only stand-alone sub-module the FSM emits; the θ / ρ / π / χ steps are all combinational and inline directly. Behavioural: pure-data check of the Ethereum-style Keccak-256 of the empty string (expected `c5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470`), plus a smoke test that the HW module instantiates cleanly on a constant input. Sparkle-side limitation: * The pure-Lean `Signal.val` simulator can't sample the full 24-round permutation output — 25 interlocking lane signals × BitVec 64 causes exponential recursion, so even a single-round sample at t=2 times out (worse than SHA-256's `sha256Block` sim which hits the same L.1.b/C2 class of problem). Cycle-by-cycle validation against `keccakF` is punted to a Verilog-backend co-sim; the algorithmic correctness is covered by the pure-data Ethereum KAT. Synth: `#synthesizeVerilog synth_keccakRc` generates the Rcon LUT sub-module. The top-level `keccakF1600HW` builds clean (`lake build IP.Crypto.Keccak256HW` green), so downstream Verilog synthesis can drive the full permutation. Wired into `lake test` via `Tests/Drivers/Keccak256HWTestMain.lean` and the `lean_exe «keccak256-hw-test»` stanza. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH

Add a `crypto-hw` matrix group that runs the wave-1 crypto HW test exes. Kept separate from the existing `crypto` group so the AES HW's ~54 s Lean-sim latency (128-bit state Signal.val recursion) doesn't sit behind the ~13 pure-data crypto exes and blow the per-group timeout. Group contents: rlp-hw-test merkle-hw-test hkdf-hw-test sha512-hw-test aes-hw-test aes-gcm-hw-test keccak256-hw-test Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH

Crypto IPs (wave 1): HW modules + synth checks for the byte/word FSM tier

Prove that Divider divides

…oint) `Signal.loop` was `opaque` (a wrapper around the `unsafe` memoizing `loopImpl`), so `(Signal.loop f).val t` could not be unfolded by the kernel. The fixpoint equation `loop f = f (loop f)` — the load- bearing fact for every feedback-circuit proof (e.g. PR #65's RV32 divider) — was therefore a *trusted axiom* (`loop_unfold`). This replaces the opaque wrapper with a *pure* definition and turns `loop_unfold` into a proved theorem, removing the custom axiom from the trust base. ## What changed `Sparkle/Core/Signal.lean`: - `loopGo f t` — the fixpoint value at time `t`, by strong recursion: apply `f` to a signal that recursively supplies the already-computed values for every earlier cycle `i < t` and `default` at `≥ t`. The `if i < t` guard makes it well-founded (`termination_by t`). Total, axiom-free. - `loop f := ⟨loopGo f⟩` — now a `def`, not `opaque`. Still carries `@[implemented_by loopImpl]`, so *execution* uses the O(n) memoizing implementation; only the *logical* value is the pure fixpoint. - `loopGo_eq` — the defining equation, exposed for proofs. `Sparkle/Verification/LoopProps.lean`: - `loop_unfold` goes from `axiom` to `theorem`, proved by rewriting through `loopGo_eq` and discharging the `default` placeholder with `StrictlyCausal` (a strictly causal body never observes cycles `≥ t`, which is exactly why the placeholder is invisible). - Module docstring updated to describe the pure definition. ## Trust base `#print axioms dividerSignal_result_gen` (a downstream RV32 divider theorem) no longer lists `loop_unfold`. Remaining axioms are all Lean-standard: `propext`, `Classical.choice`, `Quot.sound`, `Lean.ofReduceBool` (from `bv_decide`/`native_decide`, pre-existing), and `Lean.trustCompiler` (from `@[implemented_by]` — the standard trust that `loopImpl` computes the same value as `loopGo`). The bespoke fixpoint axiom is gone. ## Performance `@[implemented_by loopImpl]` keeps sim fast: a direct `Signal.loop`-counter samples `.val 2000` in ~0 ms (memoized O(n), confirmed by probe). Matched `lake exe test` runs: main 7m52s vs this branch 8m02s — within noise, no regression. ## Also fixed (surfaced by the first truly-clean build) `lakefile.lean`'s `sparkleModuleDeps` list was stale: - `sparkle_Sparkle_Backend_CppSim.so` → `_CSim.so` (CppSim was removed / renamed to CSim in the C-only-backend work; the `.so` no longer exists, so a clean build's monolithic link failed with "-l:...CppSim.so not found"). - Added the missing `sparkle_Sparkle_Core_SignalLeavesDerive.so` (the `SignalLeaves` deriving handler, added with the per-leaf output-port split, was never registered → clean-build load error "undefined symbol: initialize_..._SignalLeavesDerive"). Both were latent bugs on `main` that only bite a from-scratch `.lake/build` wipe; incremental builds kept the old `.so` around. ## Verification - `lake build` clean. - `lake exe test`: 13/13 lspec pass. - Both `Sparkle.Verification.Divider.{Correct,States33}` build against the new pure `loop` with no proof changes needed — PR #65's divider proof is now the regression test that `loop = loopGo` is the right fixpoint.

…m-smoke) PR #65 (`c59820c`) added `moreLinkArgs` to the `Sparkle` lean_lib to force the `sparkle_barrier` / `sparkle_jit` extern archives whole-into the precompiled `libsparkle_Sparkle.so` — necessary so the `.so` self-resolves `sparkle_cache_get` / `sparkle_eval_at` (the `@[extern]` LICM barriers `Signal.loop`'s memoization uses; without it the interpreter/LSP fails to dlopen the `.so`). But it used a *relative* path `./.lake/build/c_src`. `moreLinkArgs` propagates to every DOWNSTREAM consumer of the `sparkle` package, and there the relative path resolves against the *downstream's* build dir — which has no `c_src` — so the link fails: ld.lld: error: unable to find library -l:libsparkle_barrier.a ld.lld: error: unable to find library -l:libsparkle_jit.a This is why `downstream-smoke` (ubuntu / macos / wsl) went red on `main` starting at the PR #65 merge (`9c0e776`); it was green on the prior commit (`df8560e`). Fix: compute an ABSOLUTE path from the lakefile's own location (`__dir__`), captured as `sparkleCSrcDir`. Whether Sparkle is the root package (`lake build` here) or a git dependency (`<downstream>/.lake/packages/sparkle/…`), the path points at Sparkle's *own* populated `c_src`, so an inheriting downstream link resolves the archives correctly. Also simplifies the Linux arm to pass the absolute `.a` paths directly inside `--whole-archive` (no `-L dir` + `-l:name.so` file search needed). The `Sparkle`-lib force_load remains; only the path form changes. The IP-lib `sparkleDynlibLinkArgs` (per-module `.so` deps for the interpreter/LSP force-load path) is unaffected — it's already per-platform guarded and doesn't reach downstream MyBlinky. Verification: - `nm -D libsparkle_Sparkle.so` shows `sparkle_cache_get` / `sparkle_eval_at` as `T` (defined) — the `.so` is still self-contained. - `lake build` clean; `lake exe test` 13/13. - The emitted arg is now e.g. `-Wl,--whole-archive /abs/.lake/build/c_src/libsparkle_barrier.a …`.

LoopProps: make loop_unfold a theorem, not an axiom (pure loopGo fixpoint)

goldilocksMulHW — multiply mod the Goldilocks prime 2^64-2^32+1 (the STARK field), as a multi-cycle circuit do FSM. Behavioural test checks cycle-by-cycle against the pure-data Goldilocks.mul; SynthesisChecks section emits #synthesizeVerilog. 64-bit field, the cheapest of the four field-mul HW modules in this wave.

secp256k1MulHW — multiply mod the secp256k1 field prime 2^256-2^32-977 (Bitcoin/Ethereum curve), multi-cycle circuit do. Behavioural check vs pure-data Secp256k1Field.mul + #synthesizeVerilog.

p256MulHW — multiply mod the NIST P-256 field prime, multi-cycle circuit do. Behavioural check vs pure-data P256Field.mul + #synthesizeVerilog.

ed25519FieldMulHW — multiply mod 2^255-19, the HW follow-up the Ed25519Field.lean docstring promised ("the HW engine ... follows in L.2.b"). Multi-cycle circuit do; behavioural check vs the pure-data Ed25519Field.mul + #synthesizeVerilog.

BLS signatures on the BLS12-381 pairing-friendly curve — the curve Ethereum 2.0 consensus (validator aggregate signatures), zk-rollup proof verification, and threshold-signature schemes are built on. Sparkle had no BLS / pairing IP before this. Pure-data (Nat/BitVec, no Signal) reference, layered like the existing Ed25519 tower: Fp — prime field mod the 381-bit BLS12-381 base prime Fp2 — Fp[u]/(u²+1) Fp6 — Fp2[v]/(v³-ξ), ξ = u+1 Fp12 — Fp6[w]/(w²-v) (the pairing target group GT) G1 — E(Fp): y² = x³ + 4 G2 — E'(Fp2): y² = x³ + 4(u+1) (the twist) Pairing — optimal ate: Miller loop + final exponentiation, transcribed line-for-line from Ethereum's py_ecc reference (loop parameter, linefunc, final exp). sign / verify / aggregate / verifyAggregate The test (`bls12381-test`) checks: Fp12 pairing bilinearity + non-degeneracy, sign→verify round-trip PASSES, tampered signature FAILS, and a 3-signature aggregate over one message verifies. Scope notes (documented in-file): - **hash-to-curve is a SIMPLE deterministic placeholder**, not the full RFC 9380 `hash_to_curve` suite — enough for the sign/verify/aggregate self-consistency the test exercises, but NOT interoperable with real Ethereum signatures. Marked clearly so nobody mistakes it for a production H2C. - **No HW module for the pairing** — 381-bit modmul + a Miller loop on an FPGA is a research project, explicitly out of scope. BLS's value here is the machine-checkable reference + IP-catalog presence; the tractable HW unit (field multiply) ships separately in this wave for the smaller curves.

lakefile.lean: 5 new lean_exe stanzas (bls12381-test, {goldilocks,secp256k1,p256,ed25519}-mul-hw-test), each supportInterpreter := true, pointed at Tests/Drivers/*Main.lean. Tests/AllTests.lean: 5 new imports + .main calls in the release-gate runner. ip-tests.yml crypto/crypto-hw groups already cover the Crypto dir; the new exes ride along there.

junjihashimoto and others added 30 commits May 30, 2026 23:16

Merge pull request #34 from Verilean/poc/circuit-monad-v2

68445ea

Add Sparkle.Core.CircuitMonad to make circuit statements true monads.

Merge pull request #35 from Verilean/poc/circuit-monad-v2

c485c0a

Fix CI

Merge pull request #36 from Verilean/poc/circuit-monad-v2

c7fe64a

Fix the build of xeus-lean

Merge pull request #38 from Verilean/poc/circuit-monad-v2

7f236ef

Fix CI to deploy tutorials

Merge pull request #39 from Verilean/poc/circuit-monad-v2

fa80aa3

Fix CI

junjihashimoto and others added 28 commits July 1, 2026 11:15

Merge pull request #78 from Verilean/docs/hosted-links-verilator-ack

2d4cd1a

docs(readme): hosted site links + Verilator/xeus-lean/iverilog/Yosys ack

Merge pull request #79 from Verilean/feat/bus-hw-synth

b41a412

Bus IPs: add HW modules + synth checks for the 8 pure-data protocols

Merge pull request #80 from Verilean/feat/crypto-hw-synth-wave1

df8560e

Crypto IPs (wave 1): HW modules + synth checks for the byte/word FSM tier

Merge pull request #65 from mark-christiaens/divider_experiment

9c0e776

Prove that Divider divides

Merge pull request #81 from Verilean/feat/loop-pure-no-axiom

1f4c3f1

LoopProps: make loop_unfold a theorem, not an axiom (pure loopGo fixpoint)

Crypto/secp256k1: add field-multiply HW module + synth test

1de1453

secp256k1MulHW — multiply mod the secp256k1 field prime 2^256-2^32-977 (Bitcoin/Ethereum curve), multi-cycle circuit do. Behavioural check vs pure-data Secp256k1Field.mul + #synthesizeVerilog.

Crypto/P256: add field-multiply HW module + synth test

057a510

p256MulHW — multiply mod the NIST P-256 field prime, multi-cycle circuit do. Behavioural check vs pure-data P256Field.mul + #synthesizeVerilog.

junjihashimoto closed this Jul 1, 2026

junjihashimoto force-pushed the main branch from 1f4c3f1 to 8362d0e Compare July 1, 2026 21:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Crypto wave 2: BLS12-381 signature scheme + field-multiply HW#82

Crypto wave 2: BLS12-381 signature scheme + field-multiply HW#82
junjihashimoto wants to merge 985 commits into
mainfrom
feat/bls-crypto-wave2

junjihashimoto commented Jul 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

junjihashimoto commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Deliverable A — BLS12-381 (primary)

Deliverable B — field-multiply HW (4 curves)

Verification (local)

New Sparkle-side limitation found

CI wiring

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

junjihashimoto commented Jul 1, 2026 •

edited

Loading