Skip to content

Crypto wave 2: BLS12-381 signature scheme + field-multiply HW#82

Closed
junjihashimoto wants to merge 985 commits into
mainfrom
feat/bls-crypto-wave2
Closed

Crypto wave 2: BLS12-381 signature scheme + field-multiply HW#82
junjihashimoto wants to merge 985 commits into
mainfrom
feat/bls-crypto-wave2

Conversation

@junjihashimoto

@junjihashimoto junjihashimoto commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Wave 2 of the Crypto HW/reference build-out (wave 1 = AES/Keccak/SHA-512/HKDF/RLP/Merkle HW, PR #80). Adds the BLS12-381 signature scheme (pure-data reference — the load-bearing addition) plus field-multiply HW for four curves.

Deliverable A — BLS12-381 (primary)

IP/Crypto/BLS12_381.lean (~640 LOC). BLS signatures on the
BLS12-381 pairing-friendly curve — the curve Ethereum 2.0 consensus
(validator aggregate signatures), zk-rollup proof verification, and
threshold-signature schemes are built on. Sparkle had no BLS /
pairing IP
before this.

Full tower, layered like the existing Ed25519 tower:

Fp   — 381-bit base prime
Fp2  — Fp[u]/(u²+1)
Fp6  — Fp2[v]/(v³-ξ), ξ = u+1
Fp12 — Fp6[w]/(w²-v)   (pairing target group GT)
G1   — E(Fp): y² = x³ + 4                 (Jacobian)
G2   — E'(Fp2): y² = x³ + 4(u+1)          (the twist)
Pairing — optimal ate: Miller loop + final exponentiation
sign / verify / aggregate / verifyAggregate
  • Pairing transcribed line-for-line from Ethereum's py_ecc
    optimized_bls12_381 (pseudo-binary loop encoding, D-type untwist
    ψ(x,y)=(x·w⁻², y·w⁻³), homogeneous-projective linefunc with
    num/den accumulation, full (p¹²−1)/r final exp). Verified
    bilinear + non-degenerate — the property that makes
    sign/verify work.
  • Hash-to-curve (explicit in-file): real RFC 9380 §5.3
    expand_message_xmd(SHA-256) feeding a documented SIMPLE
    placeholder
    hashToG2 (scalar·generator), NOT the SSWU suite —
    so it is intentionally not blst/eth2-KAT interoperable. Clearly
    marked so nobody mistakes it for production H2C.
  • Test (bls12381-test): Fp12 pairing bilinearity +
    non-degeneracy, sign→verify PASSES, tampered-sig /
    wrong-message / wrong-keyset REJECTED, 3-sig aggregate
    verifies. Self-consistency (documented), not third-party KAT.
  • Out of scope (documented): pairing HW — 381-bit modmul + Miller
    loop on an FPGA is a research project. BLS's value here is the
    machine-checkable reference + IP-catalog presence.

Deliverable B — field-multiply HW (4 curves)

Bit-serial double-and-add modular multiply (MSB-first, per-bit
conditional-subtract reduction), each a circuit do FSM + sim
cross-check vs the pure-data mul + #synthesizeVerilog:

IP file prime test
Goldilocks GoldilocksHW.lean 2⁶⁴−2³²+1 goldilocks-mul-hw-test
secp256k1 Secp256k1FieldHW.lean 2²⁵⁶−2³²−977 secp256k1-mul-hw-test
P-256 P256FieldHW.lean NIST P-256 p256-mul-hw-test
Ed25519 Ed25519FieldHW.lean 2²⁵⁵−19 ed25519-mul-hw-test

Ed25519's HW is the follow-up its own docstring promised ("the HW
engine … follows in L.2.b"). All four share one verified generic
shift-add mod-p skeleton (not Solinas/field-specific fast reduction —
that's a follow-up); each matches (a·b) mod p exactly.

Punted (as scoped): RSA-PSS HW (2048-bit modexp), point-op /
scalar-mul HW, pairing HW.

Verification (local)

  • lake build clean — all #synthesizeVerilog compile.
  • 5/5 new exes ALL PASS: bls12381-test,
    {goldilocks,secp256k1,p256,ed25519}-mul-hw-test.
  • lake exe test: 13/13 lspec.
  • 12 new #synthesizeVerilog invocations (3 per field HW).

New Sparkle-side limitation found

A @[reducible, inline] helper wrapping a Signal.mux over an
<*>-built comparison fails synth ("Cannot instantiate Seq.seq")
though it sim-passes — the conditional-subtract must be inlined
directly into the circuit do body. Also: use BitVec.ule (not
<=) and BitVec.append (0#k) v (not .setWidth) for
zero-extend. (Extends the gotcha inventory from PR #79/#80.)

CI wiring

lakefile.lean — 5 new lean_exe stanzas. Tests/AllTests.lean
5 imports + .main calls (release-gate runner). Synth runs at
lake build; behavioural mains under lake test.

junjihashimoto and others added 30 commits May 30, 2026 23:16
Adds `mixedWidthMonad` / `mixedWidthMacro` — a BitVec 4 counter
and a BitVec 8 accumulator running through the same v2 monad
state.  The Prod-chain state packs to 12 bits (4 + 8); each
register slot is sliced out by width-aware lens reads.

This is the headline benefit of the v2 monad over v1's
`Vector τ n` state: v1 forced one element type per circuit, v2
mixes arbitrary Wireable types in a single body.  Without a
test pinning this down, the next person changing the elaborator
could silently lose the heterogeneous-width property.

Result Verilog (mixedWidthMonad):

  logic [11:0] _tmp_loop_0;     // packed state
  assign _tmp_op_a_2 = _tmp_loop_0[11:8];   // cnt slot
  assign _tmp_op_a_8 = _tmp_loop_0[7:0];    // acc slot

Sim parity verified end-to-end:

  --- mixedWidthMonad ≡ mixedWidthMacro ---
    got      = [0x00, 0x00, 0x01, 0x03, 0x06, 0x0a, 0x0f, 0x15]
    expected = [0x00, 0x00, 0x01, 0x03, 0x06, 0x0a, 0x0f, 0x15]
    PASS

The extension `0#4 ++ count` is used rather than `zeroExtend`
because `BitVec.zeroExtend` doesn't yet have an
elaborator wire-translation rule.  Once it does, the test can
move to the more idiomatic form.

Branch poc/circuit-monad-v2.
Hand-written generalisation from runCircuit2 to runCircuit3.
State shape is `τ₀ × τ₁ × τ₂` (right-nested Prod chain),
slot lenses use depth-2 `Signal.map Prod.fst/snd` chains for
the middle and tail slots.

Same recipe as runCircuit2 — both halves of the chain lower
through the elaborator's existing Prod recognition rules
(no new rules needed).  Adds tripleCountMonad regression
test (three BitVec 8 counters, output = a ^^^ b ^^^ c).

  $ lake exe circuit-monad-v2-test
  --- counterMonad ≡ counterMacro --- PASS
  --- twoCountMonad ≡ twoCountMacro --- PASS
  --- mixedWidthMonad ≡ mixedWidthMacro --- PASS
  --- tripleCountMonad ≡ tripleCountMacro --- PASS

Status of arity generalisation: the per-arity helper pattern
scales further with hand-written runCircuit4 / runCircuit5
copies, or — more principled — a macro that emits arity-N
forms.  The macro path is the natural follow-up for handling
PR #17's N-client arbiter where N is a parameter.

Branch poc/circuit-monad-v2.
`Tests/AllTests.lean` already drives `Sparkle.Tests.SignalLoopTest.main`
/ `CircuitIfTest.main` / `CircuitMatchTest.main` from its main, so a
cycle-by-cycle divergence in any of those aborts `lake test`.  Add
`Sparkle.Tests.CircuitMonadV2Test.main` to the same chain so the v2
monad surface (counterMonad / twoCountMonad / mixedWidthMonad /
tripleCountMonad — all four sim-vs-macro parity checks) is exercised
on every `lake test` invocation.

`lake test` exits 0 with the new entry; the v2 sims show up as a
fourth ALL PASS block between the existing CircuitMatchTest output
and the lspecIO runner.

Branch poc/circuit-monad-v2.
Introduces `circuit do { ... }` — the v2-monad-backed alternative
to `Signal.circuit do { ... }`.  Same surface syntax:

  - `let r ← Signal.reg init`        register declaration
  - `r <~ rhs`                       next-cycle update
  - `let x := expr`                  branch-local Lean binding
  - `if cond then ... else ...`      Signal-Bool branch over multiple registers
  - `return expr`                    output Signal

Lowers to `Sparkle.Core.runCircuit{N}` (counting the `Signal.reg`
declarations) with a body that uses `Sparkle.Core.Circuit.bind` /
`Sparkle.Core.Circuit.pure'` directly — so it goes through the
`handleCircuitMonad` recognition added earlier in this branch and
reaches `#synthesizeVerilog` for free.

Sim parity confirmed against the legacy `Signal.circuit do`
macro for all four test patterns (counter, reset counter,
two-register reset, hold semantics):

  $ lake exe circuit-do-test
  --- counterCdo ≡ counterMacro --- PASS
  --- resetCounterCdo ≡ resetCounterMacro --- PASS
  --- twoRegResetCdo ≡ twoRegResetMacro --- PASS
  --- heldRegCdo ≡ heldRegMacro --- PASS

All eight `#synthesizeVerilog` invocations in `Tests/CircuitDoTest.lean`
succeed; the cdo Verilog matches the macro reference structurally
(same registers, same mux trees).

To make `cnt + 1#8` work the way it does inside the macro
(where `cnt` is implicitly a `Signal dom τ`), `Reg dom S τ`
gets a `CoeHead` instance to `Signal dom τ`.  Without this the
user has to write `Circuit.read cnt + 1#8` everywhere, defeating
the goal of macro-parity UX.

Also wires `circuit-do-test` into `Tests/AllTests.lean` so the
sim parity is checked on every `lake test` invocation.  Drops
some exploratory definitions from `CircuitMonadV2Test.lean`
that were superseded by the cdo form.

Branch poc/circuit-monad-v2.  Open follow-ups (tracked):
  - statement-level `match` in `circuit do`
  - duplicate `<~` detection
  - migrating existing users of `Signal.circuit do` to `circuit do`
  - removing the legacy macro
…rloads

`Sparkle/Core/CircuitDo.lean` gains a statement-level `match`
lowering that mirrors the legacy macro's `Signal.mux`-chain
form: per-arm Signal.mux on `scrut === pat`, wildcard arm as
tail fallthrough, hold semantics for registers not assigned in
some arms.  Code path is implemented but not yet
production-ready — see deferred status below.

`Sparkle/Core/CircuitMonad.lean` gains:

  * `CoeHead (Reg dom S τ) (Signal dom τ)` so the register
    handle is implicitly a Signal anywhere a Signal is
    expected (e.g. `Signal.mux cnt …`).
  * `HAdd/HSub/HMul/HXor` instances for `Reg + BitVec`, `Reg +
    Reg`, `Signal + Reg` etc.  Lean resolves binary operators
    by the operand types first, so `CoeHead` doesn't fire on
    `cnt + 1#8`; explicit instances close that gap.  Mirrors
    the existing `Signal × BitVec` mixed-arity instance pattern
    in `Signal.lean`.

  * Trial `nextAny` polymorphic register write + `AsSignal`
    type class were prototyped but reverted — the type class
    indirection didn't reduce through `handleCircuitMonad`'s
    `withTransparency .all $ whnf` path for synthesis.

`Tests/CircuitDoTest.lean` is unchanged for the if/else tests
(counter / resetCounter / twoRegReset / heldReg — all still
pass).  Match tests deferred.

Open: Task 317 — `circuit do` match arms.

  The macro implementation emits
    `Signal.mux ($scrut === $pat) $armRhs $chained`
  where `$scrut` is a `Reg` (because `state` in user code
  binds to a Reg).  Lean fails to unify the type argument of
  `Signal.beq` because the elaborator can't decide whether to
  coerce `state` (Reg → Signal) or `0#2` (BitVec → Signal)
  first, and the inner `$armRhs` is a bare BitVec literal whose
  type only gets pinned down once the whole mux chain is
  resolved.

  Two viable fixes for the next pass:
    (a) Have the macro emit `state.1 === pat` (explicit Signal
        projection on the scrutinee), removing one Coe step.
    (b) Add an instance-resolution pattern that pins the
        `Signal.beq` α from the scrutinee side preferentially.

  The legacy `Signal.circuit do` macro avoids the issue because
  `state` is bound via `let state := projN! _circuit_state n 0`
  — it's already a Signal at that point.  v2's `Reg` boundary
  is more visible to the user and helpful for type safety
  (heterogeneous state) but interacts awkwardly with bare-value
  RHS arms in `match`.

Branch poc/circuit-monad-v2.
The match arm RHS that previously stalled on `?m state`
metavariables now elaborates cleanly.  The fix: in the macro,
emit the scrutinee Signal as `$scrut |>.1` — the explicit `.1`
projects out the underlying `Signal dom τ` from the user-bound
`Reg dom S τ` so Lean's type inference pins `τ` immediately,
before the per-arm pattern literals are elaborated.

Removed the `CdoScrut` typeclass / `cdoScrut` indirection I'd
trialled — the typeclass dispatch survived to synthesis and
made `Signal.beq` un-inlinable from Sparkle's side.  The direct
`.1` projection avoids both problems.

  $ lake exe circuit-do-test
  --- counterCdo ≡ counterMacro --- PASS
  --- resetCounterCdo ≡ resetCounterMacro --- PASS
  --- twoRegResetCdo ≡ twoRegResetMacro --- PASS
  --- heldRegCdo ≡ heldRegMacro --- PASS
  --- fsm3Cdo ≡ fsm3Macro --- PASS                ← match
  --- fsmHoldCdo ≡ fsmHoldMacro --- PASS           ← match + hold

Both new FSM tests synthesise (`#synthesizeVerilog fsm3Cdo`,
`fsmHoldCdo`) and produce the same shape Verilog as their
`Signal.circuit do` macro counterparts.  Trade-off: scrutinees
now must be `Reg`s (the `.1` is hardcoded), not arbitrary
Signal expressions.  Users wanting to match on an ad-hoc
Signal can precompute it into a Reg or write the `Signal.mux`
chain by hand — same workaround as the legacy macro requires
for arbitrary expressions.

Branch poc/circuit-monad-v2.
Adds the same kind of macro-level duplicate-write check the
legacy `Signal.circuit do` macro already does.  When the user
writes the same register's `<~` twice at the same statement
level — e.g.

  circuit do
    let cnt ← Signal.reg 0#8
    cnt <~ 0#8
    cnt <~ cnt + 1#8           -- silently overwrites the first
    return cnt

the macro now throws:

  circuit do: register `cnt` is assigned with `<~` more than
  once at the same statement level — last write wins (matches
  Verilog `always_ff` semantics); merge the assignments into
  a single `<~` to silence this error.

The check runs after `flattenCdoStmts` has collapsed any `if/
else` and `match` branches into single muxed writes per
register, so it only fires on duplicates the user actually
wrote at the top level (the flattener's own muxed emissions
emit at most one row per register).

Verified with `#guard_msgs` in `Tests/CircuitDoTest.lean`:

  /-- error: circuit do: register `cnt` is assigned with `<~`
              more than once …
  -/
  #guard_msgs in
  example : Signal defaultDomain (BitVec 8) :=
    circuit do
      let cnt ← Signal.reg 0#8
      cnt <~ 0#8
      cnt <~ cnt + 1#8
      return cnt

Note: the duplicate-write `example` had to use `example`
rather than `def` — a `def` whose macro errored out leaves a
`sorry`-typed body that ran the test harness into an
"INTERNAL PANIC: executed 'sorry'" on `lake exe`.  `example`
discards the body so the panic doesn't surface.

Branch poc/circuit-monad-v2.
…t do (Task 319, part 1)

Replaces uses of the legacy `Signal.circuit do` macro with the
v2 `circuit do` macro across non-test production code, the
tutorial-extended demos, and Ch 3 of the notebook solutions.
Each file's sim semantics are unchanged; the macro emits the
same Verilog through the v2 lowering (runCircuit{1,2,3} →
Signal.loop / Signal.register chain).

Migrated:

  * `Sparkle/Display/Synthesise.lean` — comment example only.
  * `Tests/Circuit/SimTest.lean` — counter + 2-reg pipeline.
    `lake exe circuit-sim-test` still PASSes.
  * `Tests/Tutorial/HierarchyTest.lean` — `latch8` /
    `latch8mod` for the inline-vs-`@[hardware_module]` split
    coverage.  `lake exe tutorial-hierarchy` still PASSes.
  * `tutorial-extended/TutorialExtended/Step8_CircuitDoNotation.lean`
    — all four examples (counter, up/down, shift pipeline,
    enabled counter).  Prose updated to point at the new
    macro.
  * `tutorial-extended/TutorialExtended/Run.lean` — comment
    string only.
  * `docs/tutorial/Notebooks/Solutions/Ch03.lean` — traffic-
    light FSM.  Required a manual `let stateSig := state.1;
    let timerSig := timer.1` projection because the body
    passes `state` directly to `Signal.mux` arms where Lean's
    Coe can't pin down `α` automatically (the rest of the body
    uses the projected `Signal` aliases).  This is the same
    "register-handle as argument" pattern the v2 `match`
    macro works around with an explicit `.1`.

Macro side improvements that enabled the migration:

  * `circuit do` top-level `let _ := _` bindings now stay in
    scope for every subsequent `<~` and the trailing `return`
    — previously each `<~` cleared the binding list, so a
    `let isGreen := state === S_GREEN` defined at the top
    wasn't visible to later assignments.  Branch-local lets
    inside if/match arms keep their existing arm-scoped
    semantics (handled separately by the flattener).
  * Added a `CoeOut (Reg dom S τ) (Signal dom τ)` instance
    alongside the existing `CoeHead`.  Doesn't fully resolve
    the `Signal.mux Reg BitVec` metavariable case (Ch 3
    needed the explicit `.1` regardless), but it widens the
    coercion firing window for other call sites.

Files NOT migrated yet (deferred to Task 320 since they're
legacy-macro regression tests):

  * `Tests/CircuitIfTest.lean` — if/else macro behaviour.
  * `Tests/CircuitMatchTest.lean` — match/case macro.
  * `Tests/SignalLoopTest.lean` — register loop semantics.

These will be removed when the legacy macro itself is removed;
the equivalent v2 coverage already lives in
`Tests/CircuitDoTest.lean` and `Tests/CircuitMonadV2Test.lean`.

Verified:

  $ lake test
  EXIT=0  (all suites pass, including the migrated files)

Branch poc/circuit-monad-v2.
…SL (Task 320)

Wraps up the v2 migration: deletes the legacy `Signal.circuit
do` macro from `Sparkle/Core/Signal.lean` and the regression
tests that were specific to it, since `circuit do`
(`Sparkle/Core/CircuitDo.lean`, on top of the v2 monad in
`Sparkle/Core/CircuitMonad.lean`) now covers every feature with
identical Verilog output.

Removed:

  * `Tests/CircuitIfTest.lean`   — covered by `CircuitDoTest`
                                    (resetCounterCdo / twoRegResetCdo
                                     / heldRegCdo, all with `if/else`
                                     lowering).
  * `Tests/CircuitMatchTest.lean` — covered by `CircuitDoTest`
                                    (fsm3Cdo / fsmHoldCdo, the
                                    match/case lowering).
  * `Tests/Drivers/CircuitIfTestMain.lean`,
    `Tests/Drivers/CircuitMatchTestMain.lean` — driver wrappers
    for the removed exes.
  * lakefile `circuit-if-test`, `circuit-match-test` lean_exe
    declarations.
  * `Tests/AllTests.lean` references and the
    Sparkle.Tests.CircuitIfTest.main / CircuitMatchTest.main
    calls.
  * `Sparkle/Core/Signal.lean` `Signal.circuit do` DSL block:
    `circuitStmt` / `circuitMatchArm` syntax categories,
    `collectBranchAssigns` / `flattenCircuitStmts`,
    `macro_rules` for the `Signal.circuit do $stmts*` form.
    ~390 lines removed.  `projN!` / `bundleAll!` stay — they're
    used by `Sparkle/Core/StateMacro.lean` and many IPs.

Rewritten:

  * `Tests/SignalLoopTest.lean` — the `Signal.circuit do`-side
    reference circuits now use `circuit do`; the raw
    `Signal.loop`/`Signal.register` parity check is otherwise
    unchanged.
  * `Tests/CircuitDoTest.lean` — the cdo-vs-macro parity tests
    became cdo-vs-hand-written-reference-list tests (no more
    legacy macro on the rhs of the comparison).  Same six
    sub-tests (counter, resetCounter, twoRegReset, heldReg,
    fsm3, fsmHold) plus the duplicate-`<~` guard.
  * `Tests/CircuitMonadV2Test.lean` — `*Macro` references
    renamed to `*Cdo` and bodies switched to `circuit do`.
    The raw `runCircuit{1,2,3}` form is the part actually
    being tested; cdo is the reference.
  * `Sparkle/Core/CircuitDo.lean` / `Sparkle/Core/CircuitMonad.lean`
    docstrings updated to reflect the macro's removal (mention
    it as the design predecessor that motivated the v2 split,
    not as a still-available alternative).

Build & sim:

  $ lake build test
  Build completed successfully (355 jobs)
  $ lake exe test
  EXIT=0  (all suites pass, including the migrated tests)

The cdo macro is now the only register-DSL surface in Sparkle.
v2 monad's `runCircuit{N}` helpers remain the lower-level
escape hatch for users who want explicit monad plumbing
(`forM`, `traverse`, etc.).

Branch poc/circuit-monad-v2.
Adds `Tests/CircuitMonadForMTest.lean` — a small test that
exercises Lean's standard `List.forM` directly over a list of
`Reg` handles, calling `Circuit.next` on each.  Demonstrates a
capability the removed `Signal.circuit do` macro couldn't
express (it was syntax-level; `forM` would have to be in its
recognized statement grammar to work).  v2's `Circuit` monad
is a real `Monad`, so `forM` / `mapM` / `traverse` / any other
`Bind`-based combinator goes through Sparkle's
`handleCircuitMonad` recognition and reaches synthesis.

The test circuit declares three `BitVec 8` registers via
`runCircuit3` and increments all three using `[r0, r1, r2].forM
(fun r => Circuit.next r (Circuit.read r + 1#8))`.  Output is
the sum of the three values each cycle.

Coverage:

  * Simulation — `lake exe test` runs the namespaced
    `Sparkle.Tests.CircuitMonadForMTest.main` and checks the
    cycle-by-cycle output against the expected list
    [3, 6, 9, 12, 15, 18].
  * Synthesis — `#synthesizeVerilog threeCountersForM` runs at
    `lake build Tests.CircuitMonadForMTest` time.  Verified
    that the generated Verilog packs the three registers into
    a 24-bit packed state with the expected
    `always_ff @(posedge clk)` blocks (DRC warning on the
    combinational output is the usual one for `return a + b + c`
    and matches the equivalent shape on macro-generated
    versions).

Standalone exe: `lake exe circuit-monad-forM-test` for ad-hoc
debugging.  Also wired into `lake test` via the AllTests
import / `Sparkle.Tests.CircuitMonadForMTest.main` call so it
runs on every `lake test`.

Branch poc/circuit-monad-v2.
Extends the imperative-HW chapter with a `forM` example that
exercises a capability the legacy `Signal.circuit do` macro
couldn't provide.  `circuit do` is a syntax-level macro that
intercepts the `do` keyword (so `forM` inside its body would
be parsed by the cdo macro, not by Lean's monad elaborator);
to use `forM` we drop down to the v2 helper `runCircuit3`,
which gives a real `Monad (Circuit dom S)` instance and
composes with everything Lean's `ForM` /
`Traversable` ecosystem offers.

Concretely:

  def threeCountersForM : Signal defaultDomain (BitVec 8) :=
    runCircuit3 0#8 1#8 2#8 (fun r0 r1 r2 => do
      [r0, r1, r2].forM (fun r =>
        Circuit.next r (Circuit.read r + 1#8))
      return Circuit.read r0 + Circuit.read r1 + Circuit.read r2)

Verified at the tutorial demo level:

  $ lake exe tutorial-extended-run | grep "forM"
  forM (3 regs)  : [0x03#8, 0x06#8, 0x09#8, 0x0c#8, 0x0f#8, 0x12#8]

The narration walks through why the legacy macro couldn't do
this and points readers at `runCircuit{N}` when they want the
full Lean monad surface.  Synthesis side is covered by
`Tests/CircuitMonadForMTest.lean`'s `#synthesizeVerilog`.

Branch poc/circuit-monad-v2.
Extends the arity wall from 3 to 4.  `runCircuit4` follows the
same Prod-chain construction as `runCircuit3`, just with one
more `Prod.snd` step in each slot lens to reach the deepest
tail.  No new elaborator rules needed — the existing
`Signal.map Prod.fst` / `Prod.snd` recognition handles depth-3
projections automatically.

`circuit do` also accepts N=4 now (cdo macro extends to call
`runCircuit4` when the body has four `let _ ← Signal.reg`).

Coverage added:

  * `Tests/CircuitDoTest.lean` `fourCounterCdo` — four BitVec
    8 counters incrementing by 1/2/3/4, sum returned.  Sim
    and synth both green.
  * `Tests/CircuitMonadForMTest.lean` `shift4ForM` — 4-stage
    shift register written via `runCircuit4` + `forM` over a
    list of (dst, src) pairs.  Demonstrates that `forM` scales
    to the 4-register arity.

Result Verilog for `shift4ForM` packs the four registers into
a 32-bit state with the expected `always_ff @(posedge clk)`
shift chain.

Status of full Task 314 (truly arbitrary N via HList state):
the `mkRegList` recursion over `αs : List Type` runs into
dependent-type juggling (the lens passed through tail recursion
needs to stay typed against the outer HList shape, not the
shrinking inner one).  Hand-rolling `runCircuitN` for N ≤ 4 is
simpler and good enough for the immediate use cases; truly
arbitrary N can wait until a concrete need (e.g. 8+ stage
pipeline) materialises, and the macro-based path
(`runCircuitN` as a Lean macro expanding to a hand-rolled form)
is the likely route.

Also reverted an experimental `SlotLens` / `headLens` /
`tailLens` block from `Sparkle/Core/Wireable.lean` that was
introduced during the dependent-type attempt — it's not used
by the N=4 hand-rolled form and would have become dead code
otherwise.

Branch poc/circuit-monad-v2.
Adds `runCircuitH` — a single helper that takes a heterogeneous
list of initial values (`HList αs`) and a `RegList`-receiving
body, replacing the per-arity wall at `runCircuit{1,2,3,4}`.

Pieces:

  * `Sparkle/Core/Wireable.lean`
    - `HListWireable αs` typeclass + auto-derived instances
      from `Wireable α` for each element.  Constrains the slot
      types at the `runCircuitH` call site so non-synthesisable
      types (e.g. `Option Nat`) are rejected up front.
    - `Wireable (HList αs)` instance by induction over the
      Prod chain.
  * `Sparkle/Core/CircuitMonad.lean`
    - `RegList dom S αs` — list of register handles, one per
      slot, all carrying the same outer state `S`.
    - `mkRegList` — builds the `RegList` from a Signal-level
      lens pair by composing `Prod.fst`/`Prod.snd` accessors
      at each depth.
    - `packRegister` — closes the loop with `Signal.register` /
      `bundle2` chain on the `αs` list.
    - `runCircuitH` — `Signal.loop` + `mkRegList` + `packRegister`
      glued together; `[HListWireable αs]` constraint enforces
      slot-type validity.
  * `Sparkle/Compiler/Elab.lean`
    - `inferHWType` learns to handle `Unit` / `PUnit` (the
      HList terminator), returning `.bitVector 0` so the
      surrounding `Prod` width accumulation sums correctly.
    - `Signal.pure ()` / `Signal.pure ⟨⟩` synth case added —
      `packRegister []` lands on `Signal.pure ()`, which gets
      a 0-bit constant wire.

Tests (sim + synth):

  $ lake exe run-circuit-h-test
  --- counterH ---       PASS  (N=1, BitVec 8)
  --- twoCountH ---      PASS  (N=2, BitVec 8 + BitVec 8)
  --- mixedWidthH ---    PASS  (N=2, BitVec 4 + BitVec 8 — heterogeneous)
  ALL PASS

Each test's `#synthesizeVerilog` succeeds.  The generated
Verilog for `mixedWidthH` packs the state into a 12-bit packed
register chain just like the hand-rolled `runCircuit2` form did,
with no extra Verilog cost from the HList machinery.

Status of Task 314 (any-arity): N=1,2 confirmed working; N=3+
should follow by construction from the `mkRegList` / `packRegister`
recursions but isn't yet covered by a dedicated test.  The
existing `runCircuit{1,2,3,4}` helpers stay in place for now;
folding all call sites onto `runCircuitH` is a follow-up.

Branch poc/circuit-monad-v2.
The per-arity helpers (`runCircuit1`/`2`/`3`/`4`) are gone;
`runCircuitH` takes an `HList αs` of initial values and a body
that receives a `RegList` Prod chain.  The `circuit do` macro
now lowers to a single shape regardless of N: build the
HList inits expression, build the let-chain to destructure
the `RegList` into named `r0`, `r1`, … handles, and call
`runCircuitH`.

Removed:

  * `Sparkle/Core/CircuitMonad.lean` runCircuit1/2/3/4
    definitions (~188 lines).
  * `Tests/CircuitMonadV2Test.lean` + driver — folded into
    `Tests/RunCircuitHTest.lean`, which now covers
    N=1/2/3/4, mixed-width, and forM.
  * `Tests/CircuitMonadForMTest.lean` + driver — likewise
    superseded by `RunCircuitHTest`'s `threeCountForM` case.
  * `lake exe circuit-monad-v2-test` and `circuit-monad-forM-test`
    from `lakefile.lean`.

Updated:

  * `Sparkle/Core/CircuitDo.lean` macro emits
    `runCircuitH (αs := [_, _, …, _])` with explicit list of
    `_`-typed slot types so Lean's HList inference resolves
    fully from the inits.  Body destructures via a `let
    (r0, rest1) := _cdoRegs; let (r1, rest2) := rest1; …`
    chain.
  * `Tests/RunCircuitHTest.lean` adds tripleCountH (N=3 XOR),
    fourCountH (N=4 sum), threeCountForM (forM over a
    destructured list).  All sim PASS, all
    `#synthesizeVerilog` succeed.
  * `tutorial-extended/.../Step8_CircuitDoNotation.lean`
    `threeCountersForM` switched to `runCircuitH`.

The Verilog generated for these tests is unchanged from the
hand-rolled `runCircuit{N}` versions — `HList αs` reduces to
the same Prod chain, so the IR elaborator lowers it identically.

Branch poc/circuit-monad-v2.
…`import Sparkle`

Two CI-fix changes:

  1. `Sparkle.lean` (the root module re-exporter) now imports
     `Sparkle.Core.CircuitMonad` and `Sparkle.Core.CircuitDo`.
     Without these, `import Sparkle` doesn't bring the `circuit
     do` macro into scope, so user code (and the tutorial) needs
     extra `import` lines.  Adding them to the root keeps the
     `import Sparkle` UX consistent across legacy and v2.

  2. `docs/Tutorial.md` switches from the removed legacy
     `Signal.circuit do` macro to the v2 `circuit do` macro
     introduced in this branch (Task 320).  Two code blocks
     (counter8' and shiftPipeline) were dying with
     `unexpected token '~'` because `<~` lives only inside the
     `circuit do` macro now.

Verified:

  $ bash scripts/check_tutorial.sh
  All tutorial blocks type-check cleanly.

The pre-existing `evalSignalAt` runtime errors on block_00 /
block_01 (which run `#eval ... .sample`) are explicitly
filtered out by the check script via `grep -v "Could not find
native implementation"` — those are FFI runtime issues unrelated
to the markdown.

Branch poc/circuit-monad-v2.
…apters

Task 320 removed the legacy `Signal.circuit do` macro; the v2
`circuit do` macro replaces it.  21 prose mentions and code
blocks across 10 chapter files were still referring to the
removed macro, which left their code blocks failing
`scripts/check_tutorial.sh` with `unexpected token '<~'`.

Mechanical sed replacement:

  for f in docs/tutorial/md/*.md; do
    sed -i 's/Signal\.circuit do/circuit do/g' "$f"
  done

Result on `bash scripts/check_tutorial.sh <ch>.md` (PASS = no
elaboration errors, ignoring `Could not find native
implementation` runtime issues which the script filters):

  before  → 8 chapters PASS / 6 FAIL  (main)
  after   → 9 chapters PASS / 5 FAIL  (this branch, after the rename)

Ch08_Yosys.md moves from FAIL to PASS as a direct result of
the rename.  The remaining failing chapters
(Ch01b_YourFirstProject, Ch02_Combinational, Ch03_Sequential,
Ch04_Modules, Ch05_Verilog, Ch06_LTL, Ch07_Equivalence,
Ch07b_FpQuantEquivalence, Ch08b_Simulation, Ch09_FPGA,
Ch10_Architecture) all fail on issues that predate this branch
(stray `end`s, unknown `Display.*` identifiers, `#`-prefixed
commands as top-level statements, etc.) — none of them mention
`<~` or the legacy macro after this commit, so the v2 migration
is not the cause.

Branch poc/circuit-monad-v2.
…03/Ch09

Two follow-up fixes for the `lake build TutorialNotebooks` job
that runs via xlean-convert against `docs/tutorial/md/Ch*.md`:

  1. Add `HAnd`/`HOr`/`HXor` instances for
     `Reg dom S (BitVec n)` operands (Reg × BitVec, Reg × Reg)
     to `Sparkle/Core/CircuitMonad.lean`.  The Ch09 blinky
     example does `count &&& 0x800000#24` with `count : Reg`,
     which fell back to `HAnd (Reg …) (BitVec n) ?` and
     failed instance resolution.  Mirrors the existing
     `HAdd`/`HSub`/`HMul`/`HXor` patterns for the same operand
     mixes.

  2. In `Ch03_Sequential.md` (`fsm`, `counterEn`) and
     `Ch09_FPGA.md` (`blinky`), project the register handle to
     its underlying `Signal` (`let countSig := count.1`)
     before mixing it into `Signal.mux` / `===` chains.
     Same workaround the v2 test files use — `Signal.mux ? ?
     reg` doesn't resolve the `Coe Reg → Signal` because both
     branches need coercion and `α` is unset.  The
     `Signal.lit dom CONST` ascription pattern from `Ch03`'s
     traffic-light example is also used here for the bare
     `BitVec` arms of the next-state mux.

After:

  $ XLEAN=…/xlean-convert bash docs/tutorial/build-from-md.sh
  $ lake build TutorialNotebooks
  Build completed successfully (70 jobs)

  $ lake exe test
  EXIT=0

Branch poc/circuit-monad-v2.
…ing the build)

CI was running \`lake update\` in both the \`build\` and
\`tutorial-notebooks\` jobs.  That re-resolves \`require LSpec
from git ... @ "main"\`, which now points at an LSpec commit
(d3c15b93, 2026-04-15) that bumps to Lean v4.29.  Lake then
*uplifts the toolchain* from our pinned \`v4.28.0\` to the
newer one the dependency demands, and CI runs end up
elaborating LSpec under Lean v4.31.0-rc1 — where a few
\`instance\` declarations in \`LSpec/LSpec.lean\` now flag as
"argument cannot be inferred using typeclass synthesis".

The build log made this look like a \`verilog-tests\`
failure because \`lake exe verilog-tests\` is the first step
that actually invokes the freshly-rebuilt LSpec.  In fact
the project itself is healthy; only the toolchain drift
broke it.

The committed \`lake-manifest.json\` already pins LSpec
(\`dc0904293d30\`, 2026-03-06) to a commit that's compatible
with our toolchain.  Dropping \`lake update\` makes CI honour
that pin.  Reproducible builds 101.

If we ever do want to bump LSpec or any other dep, the
intended workflow is:

  1. Bump \`lean-toolchain\` locally to whatever the new
     LSpec requires.
  2. Run \`lake update\` locally to refresh
     \`lake-manifest.json\`.
  3. Verify the build, commit both files together.

…rather than implicitly bumping via CI on every push.

Verified locally that \`lake exe verilog-tests\` and
\`lake build TutorialNotebooks\` work without \`lake update\`.

Branch poc/circuit-monad-v2.
The new \`HAdd (Reg …) …\` / \`HXor (Reg …) …\` operator
instances introduced in \`Sparkle/Core/CircuitMonad.lean\`
bind their arguments as plain \`a b\` lambdas inside
instance definitions.  When the elaborator later unfolds
those instances for synthesis, Lean's macro-hygiene
machinery produces identifier names like:

  x__@_Sparkle_Core_Signal_2408276647__hygCtx__hyg_45

These reach \`freshName\` as the hint argument and get
embedded into \`_gen_\` wire names verbatim — but \`@\` isn't
a valid SystemVerilog identifier character, so Verilator
chokes:

  %Error: generated_soc.sv:7942:25: syntax error, unexpected '@'
   7942 |     logic [0:0] _gen_x__@_Sparkle_Core_Signal_…

Fix: in \`freshName\`, strip the hint at the first \`@\`
(after trimming a trailing \`_\` so \`_@…\` doesn't leave a
trailing underscore).  Lean's hygiene suffix uses the
exact pattern \`<name>__@_<module>__hygCtx__hyg_<N>\`, and
\`@\` never appears in user-facing identifiers; the cut is
unambiguous.

Verified:

  $ lake build IP.RV32.SoCVerilog
  Build completed successfully (129 jobs).

  $ cd verilator && make build
  ...
  - Verilator: Walltime 4.141 s
  (clean Vrv32i_soc binary, no Verilog syntax errors)

Branch poc/circuit-monad-v2.
Add Sparkle.Core.CircuitMonad to make circuit statements true monads.
…nputs alone

The previous implementation had two correctness bugs that
showed up only on CI:

  1. A name without any `@` was still passed through
     \`endsWith "_"\` + \`dropRight 1\`, which silently
     truncates a legitimate trailing underscore (e.g. \`x_\`
     → \`x\`).  Now we early-return the input unchanged
     unless an \`@\` is actually present.

  2. When stripping, only **one** trailing underscore was
     dropped.  Lean's hygiene marker uses \`__@\` (two
     underscores), so \`x__@…\` was getting cut to \`x_\`,
     not \`x\`.  That left \`x_\` as the visible name, which
     could collide with another legitimate \`x_\`-prefixed
     wire downstream — exactly the kind of silent rename
     that flips a JIT counter from "increments each tick"
     to "stays at 0", as the CI \`svparser-test\` Test 6
     was showing.  Now we drop *every* trailing \`_\` from
     the prefix.

Verified locally that \`lake exe svparser-test\` now
reports \`Test 6: E2E JIT simulation... PASS (count=9
after 10 cycles)\` (29/29 passing) and that the
RV32 SoCVerilog output is still hygiene-free.

Branch poc/circuit-monad-v2.
…pstream

Upstream xeus-lean no longer ships \`build.sh\` at its repo
root; the kernel build is now driven by Makefile + CMakeLists
+ pixi.toml with non-trivial emscripten / xtl / xwidgets
prerequisites.  Our \`Mode 2\` step in \`tutorial-notebooks\`
was running

  cd /tmp/xeus-lean && bash build.sh

which fails immediately with \`No such file or directory\`,
showing up as an annotation \`error\` on every CI run.  The
step was \`continue-on-error: true\` so the job stayed green
overall, but the noise was unhelpful.

Mode 1 (\`lake build TutorialNotebooks\` against the Display
shim in \`docs/tutorial/Display.lean\`) is the bedrock check
and already covers every chapter's code blocks.  Running the
generated notebooks under the *real* xeus-lean kernel would
catch additional drift (real \`Display.waveform\` /
\`#mermaid\` implementations), but it's a separate kind of
test that deserves its own CI pipeline — when xeus-lean's
build story stabilises.

Branch poc/circuit-monad-v2 / merge \#35 → main.
…ages

\`https://verilean.github.io/sparkle/\` was 404 because the
\`gh-pages\` branch only contained \`dev/*\` (benchmark
dashboards published by \`benchmark-action/github-action-benchmark\`)
— no top-level \`index.html\` and no API / tutorial content.

New steps under the existing \`tutorial-notebooks\` job (only
on \`push\` to \`main\`):

  1. \`Doc: build doc-gen4 API reference\` — runs
     \`lake build Sparkle:docs\` to render every module
     reachable from the root \`Sparkle.lean\` into
     \`.lake/build/doc/\`.

  2. \`Doc: convert tutorial notebooks to HTML\` —
     \`jupyter nbconvert --to html --template classic\`
     for each \`docs/tutorial/Notebooks/Gen/notebooks/*.ipynb\`,
     output into \`site/tutorial/\`.

  3. \`Doc: assemble site\` — copies the doc-gen4 output to
     \`site/api/\` and writes two hand-crafted index pages:
     \`site/index.html\` (landing — tutorial / API / benchmark
     links + project blurb) and \`site/tutorial/index.html\`
     (chapter list).  Plain inline CSS, no extra theme deps.

  4. \`Doc: publish to gh-pages\` — \`peaceiris/actions-gh-pages@v3\`
     with \`keep_files: true\` so the benchmark dashboards
     under \`gh-pages/dev/*\` keep working alongside the new
     \`/\`, \`/api/\`, and \`/tutorial/\` paths.

The whole block is gated on \`github.event_name == 'push'
&& github.ref == 'refs/heads/main'\`, so PRs don't churn the
gh-pages branch.

Expected after the next main push:

  https://verilean.github.io/sparkle/            → landing
  https://verilean.github.io/sparkle/api/        → doc-gen4
  https://verilean.github.io/sparkle/tutorial/   → chapter HTMLs
  https://verilean.github.io/sparkle/dev/...     → benchmarks (unchanged)
The `Doc: convert tutorial notebooks to HTML` step was failing with
`jupyter: command not found` (exit 127) because no pip step ever
installed it.  The prior commit's comment incorrectly claimed an
"earlier pip step" — there was none.  Add an explicit install step
right after `Install Lean toolchain` and put ~/.local/bin on
$GITHUB_PATH so subsequent steps see the binary.
… branch)

Switch GitHub Pages deployment from the legacy gh-pages-branch flow
(peaceiris/actions-gh-pages + benchmark-action auto-push) to the
workflow-based actions/deploy-pages@v4 path:

- 3× benchmark-action calls now use `external-data-json-path`
  (writes JSON to bench-data/, no git branch touched).
- `build` job uploads `bench-data/*.json` as an artifact.
- `tutorial-notebooks` job downloads the bench artifact, generates
  a `site/dev/<bench>/index.html` per suite (latest-run snapshot;
  history is intentionally not preserved), then publishes the full
  `site/` tree via actions/upload-pages-artifact +
  actions/deploy-pages.
- Required permissions (pages: write, id-token: write) and the
  github-pages environment added to the job.

Effect: `gh-pages` branch is no longer written to, so it can be
deleted; user clones stay lean.  Pages source must be set to
"GitHub Actions" in repo settings (done via API).
The previous attempt put `environment: github-pages` directly on the
`tutorial-notebooks` job, which made GitHub evaluate the environment's
deployment branch policy on every run — including PR runs from
`poc/circuit-monad-v2`, which got rejected with:

    Branch "refs/pull/39/merge" is not allowed to deploy to github-pages
    due to environment protection rules.

Move just the `actions/deploy-pages@v4` step into a new
`pages-deploy` job gated by
`if: github.event_name == 'push' && github.ref == 'refs/heads/main'`.
The environment binding lives only on that job, so PR runs skip it
entirely and never trigger the protection rule.

The assemble work stays in `tutorial-notebooks`, which now ends with
`actions/upload-pages-artifact@v3`; `pages-deploy` only runs
`deploy-pages` against that artifact.
junjihashimoto and others added 28 commits July 1, 2026 11:15
- IP/Bus/SBUSHW.lean: byte-serial S.BUS frame accumulator
  (`frameAccumulatorHW`).  Tracks a 5-bit byte-index counter
  (0..24), latches byte 0 as the header check, byte 24 as the
  footer check, and exposes an 11-bit channel-0 view assembled
  from bytes 1 and low-3-bits of byte 2 (LSB-first packing per
  S.BUS spec).
- Tests/IP/Bus/SBUSHWTest.lean: behavioural check — packs 16
  channels via the pure-data `IP.Bus.SBUS.packChannels`, wraps
  with header/flags/footer, feeds 25 bytes one per cycle, and
  asserts `headerOk` latched, `footerOk` pulsed on cycle 25, and
  ch0 matches `unpackChannels[0]`.  Two #synthesizeVerilog
  checks (idx scalar + ch0 scalar).
- Tests/Drivers/SBUSHWTestMain.lean, lakefile.lean: wire the
  `sbus-hw-test` lean_exe.

Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
- IP/Bus/CRSFHW.lean: byte-serial CRC-8 (poly 0xD5) accumulator
  (`crc8HW`) — the CRSF frame-check function.  Consumes one byte
  per cycle with the 8 bit-serial shift steps unrolled
  combinationally.  Same construction shape as CANHW.crc15HW.
- Tests/IP/Bus/CRSFHWTest.lean: behavioural check — folds a
  known 11-byte Link Statistics payload (type + 10 data bytes)
  through the HW LFSR and matches the register at cycle N+1
  against `IP.Bus.CRSF.crc8` on the same array.
  #synthesizeVerilog on the 8-bit CRC output.
- Tests/Drivers/CRSFHWTestMain.lean, lakefile.lean: wire the
  `crsf-hw-test` lean_exe.

Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
- IP/Bus/MIL1553HW.lean: two HW pieces for the MIL-STD-1553B
  physical layer:
    * `oddParityHW` — combinational 16-bit odd-parity generator
      (XOR-reduce over the 16 content bits + invert), matches
      `IP.Bus.MIL1553.oddParity` cycle-by-cycle.
    * `manchesterEncoderHW` — bi-phase-L (1553B convention) with
      a 1-bit phase register that toggles per enable tick.  Line
      level is (phase==0 ? !bitIn : bitIn), so 0→HL and 1→LH.
- Tests/IP/Bus/MIL1553HWTest.lean: behavioural check — sweeps 16
  content values through oddParity and checks vs pure-data ref,
  and prints an 8-cycle Manchester encoder trace on a 3-bit
  pattern.  Two #synthesizeVerilog checks (parity + line).
- Tests/Drivers/MIL1553HWTestMain.lean, lakefile.lean: wire the
  `mil1553-hw-test` lean_exe.

Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
- IP/Bus/CANopenHW.lean: two HW pieces layered on IP.Bus.CAN /
  IP.Bus.CANHW:
    * `cobIdDemuxHW` — purely combinational: splits an 11-bit
      COB-ID into (function code, node ID) + one-hot flags for
      the CiA 301 predefined connection set (NMT / SYNC /
      Heartbeat / SDO Tx/Rx).  Uses `BitVec.extractLsb'` for
      slicing.
    * `nmtStateFsmHW` — 2-bit NMT state register with the
      standard state transitions (0=pre-op, 1=oper, 2=stopped,
      3=boot-up).  Accepts a validated command byte and an
      external reset pulse; priority-mux dispatch on the byte.
- Tests/IP/Bus/CANopenHWTest.lean: behavioural check — sweeps 6
  representative COB-IDs (NMT / SYNC / Heartbeat / SDO Tx / SDO
  Rx / TPDO1) and compares (fc, nid) + one-hot flags against
  `IP.Bus.CANopen.decodeCobId`.  Then feeds a 4-cycle NMT
  command sequence and asserts the state trajectory.  Three
  #synthesizeVerilog checks (fc, isNmt, state).
- Tests/Drivers/CANopenHWTestMain.lean, lakefile.lean: wire the
  `canopen-hw-test` lean_exe.

Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
- IP/Bus/DroneCANHW.lean: three HW pieces layered on
  IP.Bus.CAN / IP.Bus.CANHW:
    * `crc16CcittHW` — byte-serial CRC-16-CCITT-FALSE (poly
      0x1021, init 0xFFFF).  Widens the input byte to 16 bits
      via concat, XORs with the CRC state after shifting into
      the upper octet, then runs 8 combinationally-unrolled
      MSB-first LFSR steps.
    * `nodeFilterHW` — purely combinational 7-bit src-node vs
      self-node comparator.
    * `transferIdTrackerHW` — mid-transfer sanity tracker: 5-bit
      transfer-ID + 1-bit toggle registers with mismatch
      detection, clear on SOT.  Uses (fun b => !b) <$> (a == b)
      idiom for Bool inequality (avoids the Bool `!=`/`not`
      synth pitfall documented in the shared-CLAUDE.md notes).
- Tests/IP/Bus/DroneCANHWTest.lean: behavioural check — folds a
  7-byte NodeStatus payload through crc16CcittHW and compares
  against `IP.Bus.DroneCAN.crc16Ccitt`; then walks the tracker
  through a well-formed single-frame + 2-frame stimulus and
  asserts error stays low.  Three #synthesizeVerilog checks
  (crc, accept, error).
- Tests/Drivers/DroneCANHWTestMain.lean, lakefile.lean: wire the
  `dronecan-hw-test` lean_exe.

Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Add a new matrix group `bus-hw` for the 8 lean_exes wired up
in the preceding commits (lin-hw-test / i2c-hw-test /
spi-hw-test / sbus-hw-test / crsf-hw-test / mil1553-hw-test /
canopen-hw-test / dronecan-hw-test).

Rationale for a separate matrix entry rather than folding into
the existing `bus` group: `bus` already carries 8 exes (pcie-
test, pcie-hft-test, can-*, canopen-test, dronecan-test,
serial-bus-test, avionics-bus-test) and adding 8 more would
push it to 16, past the ≤10 heuristic called out in
ip-tests.yml's header comment.  Splitting keeps failure
attribution tight and the group cache-friendly.

Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Add the 8 new `Sparkle.Tests.IP.Bus.<Name>HWTest.main`
invocations to the master `lake test` runner so a runtime
divergence in the LFSR / FSM / accumulator paths aborts the
release gate (each `.main` calls `IO.Process.exit 1` on
mismatch, same as the PCIe / Ethernet / Crypto sim mains
above).

Importing the modules also pulls their `#synthesizeVerilog`
checks into the `lake build test` step, so a synth-side
regression on any of the 8 breaks `lake test` at build time
too.

Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
docs(readme): hosted site links + Verilator/xeus-lean/iverilog/Yosys ack
Bus IPs: add HW modules + synth checks for the 8 pure-data protocols
Byte-serial RLP header emitter (`rlpHeaderHW`) covering the
three prefix classes: 1-byte (≤55), 2-byte (56..255), and
3-byte (256..2047) headers, for both byte-string and list
frames.  Behavioural check compares cycle-by-cycle output
against `IP.Crypto.RLP.encodeLength` + a payload sweep;
also verifies the classic ["cat","dog"] Ethereum-wiki KAT
through the pure-data encoder.

The HW handles only the length-prefix decode — payload
byte concatenation is a plain 8-bit pass and doesn't need
its own module.

Wired into `lake test` and the release-gate `Tests.AllTests`
runner via a driver at `Tests/Drivers/RLPHWTestMain.lean`
and a `lean_exe «rlp-hw-test»` stanza.

Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Streaming Merkle-tree root accumulator (`merkleRootHW`) with
4 slots × 256 bits (supports trees up to 2^4 = 16 leaves).
Uses a carry-propagating "adder-style" FSM: on each `push`,
the leaf digest walks up occupied slot levels, combining with
the resident partial digest via an external SHA-256 hasher
(`combineLeft`/`combineRight`/`combineOut`/`combineDone`
handshake).  Delegating the hash keeps the module compact and
lets the same slot machinery scale to deeper trees without
also hauling the SHA-256 engine's state.

Behavioural: streams `hashLeaf 42..45` through the FSM, feeds
the three combine responses on a pre-computed schedule, and
verifies slot 2 holds the 4-leaf `Merkle.commit` root
(hash-tree derived from SHA-256 over 8-byte LE leaf encodings).

Wired into `lake test` and the release-gate `Tests.AllTests`
runner via `Tests/Drivers/MerkleHWTestMain.lean` and the
`lean_exe «merkle-hw-test»` stanza.

Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
HKDF-Expand round-counter FSM (`hkdfExpandHW`).  Owns the
HKDF-specific piece — a 4-state machine (idle → trigger →
wait → done) that walks the counter i = 1..N, pulses an
external HMAC-SHA-256 handshake once per round, latches the
returned T(i) block, and asserts `done` when all rounds
complete.  HMAC-SHA-256 itself is delegated (same pattern as
Merkle's SHA-256 combiner) — this HW closes the round-counter
piece that isn't inside the HMAC engine.

Behavioural: verifies pure-data `hkdfExtract` + `hkdfExpand`
match RFC 5869 Test Case 1 (PRK + 42-byte OKM), then drives
the FSM through a 2-round HKDF-Expand (L = 42 bytes) with
pre-computed T(1)/T(2) from `hmacSha256` and checks the FSM
raises `done` with `tPrev = T(2)` at the expected cycle.

Wired into `lake test` and the release-gate `Tests.AllTests`
runner via `Tests/Drivers/HKDFHWTestMain.lean` and the
`lean_exe «hkdf-hw-test»` stanza.

Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Signal-side helpers mirroring SHA-256's L.1.b HW piece: 64-bit
`rotr64Sig`, `shr64Sig`, `bigSigma0Sig64`/`bigSigma1Sig64`,
`smallSigma0Sig64`/`smallSigma1Sig64`, `chFn64Sig`,
`majFn64Sig`, plus an 80-entry `kMux` selecting K[t] from a
7-bit round counter (via `kLut!` — same shape as SHA-256's).

Behavioural: verifies pure-data SHA-512("abc") against
FIPS 180-4 App C.1 ("ddaf35a1…54ca49f"), then cycles each
Signal helper against its pure-data twin on a non-trivial
input word, plus 4 K-mux table lookups (t = 0, 16, 63, 79).

Synth: `#synthesizeVerilog synth_sha512KMux` produces the
80-way 64-bit constant table.  Stand-alone synth of the
combinational helpers is punted — cross-module inline of
`@[reducible, inline]` defs isn't on the elaborator's fast
path yet (same as SHA-256's own helpers today; only kMux is
synth-checked there).

The 80-round iterative compressor (`sha512Block` analogue
to `sha256Block`) is not shipped in this commit — the same
Lean-sim exponential-recursion cost that gates SHA-256's C2
follow-up applies to the 1024-bit-wide W buffer here.

Wired into `lake test` via `Tests/Drivers/SHA512HWTestMain.lean`
and the `lean_exe «sha512-hw-test»` stanza.

Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Round-per-clock iterative AES-128 encryptor (`aes128BlockHW`).
128-bit state register + on-the-fly key expansion.  12-cycle
pipeline: start pulse (t=0) latches key/block, initial XOR at
t=1, 9 mid rounds (t=2..10) with MixColumns, final round at
t=10, ciphertext valid + done pulse at t=11.

Primitive combinational sub-modules (all `@[reducible, inline]`
in-file docstrings; callers can reuse them for their own AES
compositions):
  * `sboxHW`          — 256-way `kLut!` byte substitution
                        (also stand-alone `#synthesizeVerilog`d).
  * `rconHW`          — 16-way `kLut!` round-constant lookup
                        (indexed by round number 1..10;
                        also stand-alone `#synthesizeVerilog`d).
  * `subBytesHW`      — 16 `sboxHW` instantiations + repack.
  * `shiftRowsHW`     — pure byte-position wiring permutation.
  * `mixColumnsHW`    — 4 GF(2^8) column mixes via `xtimeHW`.
  * `addRoundKeyHW`   — 128-bit state XOR.
  * `keyExpansionHW`  — one-round key derivation
                        (SubWord + RotWord + Rcon).

The top-level FSM inlines the primitive bodies directly into
its `circuit do` block (rather than calling them by name) so
the elaborator can unfold everything inside one module —
cross-module inline of the plain combinational defs isn't on
the fast path today.  The named defs remain as reusable
combinators + a documentation surface.

Behavioural: FIPS 197 App B KAT
  key       = 2b7e151628aed2a6abf7158809cf4f3c
  plaintext = 3243f6a8885a308d313198a2e0370734
  expected  = 3925841d02dc09fbdc118597196a0b32
The HW produces the expected ciphertext at t=11 (~54 s Lean
sim wall-time on the 128-bit state, ok for the sub-200-cycle
horizon in CLAUDE.md).

Synth: `#synthesizeVerilog` on `sboxHW` and `rconHW` sub-modules
in-file.  Top-level FSM synth (`aes128BlockHW.ciphertext`/`done`
projections) trips the multi-output sub-module projection path
that PR #66 still tracks as a known-fragile spot; deferred to
that follow-up.

PUNTED (direction #2, follow-up commit):
  * AES decryption (`invSubBytes`, `invMixColumns`, reverse-order
    round keys).  The pure-data reference already ships in
    `IP/Crypto/AES.lean`; only the HW mirror is deferred.

Wired into `lake test` via `Tests/Drivers/AESHWTestMain.lean`
and the `lean_exe «aes-hw-test»` stanza.

Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Two GCM-specific HW pieces that sit on top of the existing
AES-128 and GHASH engines:

  * `gcmCounterHW` — 128-bit CTR block with NIST inc32
    semantics (add 1 mod 2^32 to the low 32 bits, preserve
    the high 96 bits).  Latches J_0 on start, increments
    on each step pulse.

  * `gcmTagAccumulatorHW` — Y_i ← gmul(Y_{i-1} XOR C_i, H)
    FSM.  Same IDLE/BUSY handshake as
    `IP.Crypto.GHASHHW.ghashFullHW` but exposes `mulX` and
    `fire` on the boundary so the caller wires up the
    multi-cycle multiplier explicitly.

Together with `AESHW.aes128BlockHW` and `GHASHHW.gmulHW` these
close the AEAD_AES_128_GCM datapath — every stage of the tag
+ ciphertext pipeline now has a `circuit do` module.

Behavioural: cross-checks pure-data `encryptAead` against NIST
SP 800-38D Appendix B Test Case 2 (16-byte plaintext of zeros,
128-bit key of zeros, empty AAD — expected C =
0388dace…b2fe78, T = ab6e47d4…57bddf).  Then verifies the
counter's inc32 behaviour over 3 step pulses, and the tag
accumulator's handshake on a single block feed (fire pulses,
mulX = Y XOR block, Y latches on mulDone).

Synth: `#synthesizeVerilog` on `gcmCounter.counter`,
`gcmTagAccumulator.y`, and `.fire` outputs.

Wired into `lake test` via `Tests/Drivers/AESGCMHWTestMain.lean`
and the `lean_exe «aes-gcm-hw-test»` stanza.

Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Full 24-round Keccak-f[1600] iterative permutation
(`keccakF1600HW`) implemented as a `circuit do` FSM with 25
separate `BitVec 64` lane registers (indexed `x + 5*y`).  Each
cycle computes one combinational θ → ρ → π → χ → ι round via
`keccakRoundHW`.  Pipeline:
  cycle 0    : `start` pulses, latch input state.
  cycle 1..24: one round per clock (cnt = 1..24).
  cycle 25   : done pulse, hold state.

The ι-step Rcon LUT is a `@[hardware_module] keccakRcHW`
(32-way `kLut!` — 24 constants + 8-slot pad to a power of 2).
It's the only stand-alone sub-module the FSM emits; the θ / ρ
/ π / χ steps are all combinational and inline directly.

Behavioural: pure-data check of the Ethereum-style Keccak-256
of the empty string (expected
`c5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470`),
plus a smoke test that the HW module instantiates cleanly on a
constant input.

Sparkle-side limitation:
  * The pure-Lean `Signal.val` simulator can't sample the
    full 24-round permutation output — 25 interlocking lane
    signals × BitVec 64 causes exponential recursion, so even
    a single-round sample at t=2 times out (worse than
    SHA-256's `sha256Block` sim which hits the same L.1.b/C2
    class of problem).  Cycle-by-cycle validation against
    `keccakF` is punted to a Verilog-backend co-sim; the
    algorithmic correctness is covered by the pure-data
    Ethereum KAT.

Synth: `#synthesizeVerilog synth_keccakRc` generates the
Rcon LUT sub-module.  The top-level `keccakF1600HW` builds
clean (`lake build IP.Crypto.Keccak256HW` green), so downstream
Verilog synthesis can drive the full permutation.

Wired into `lake test` via `Tests/Drivers/Keccak256HWTestMain.lean`
and the `lean_exe «keccak256-hw-test»` stanza.

Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Add a `crypto-hw` matrix group that runs the wave-1 crypto HW
test exes.  Kept separate from the existing `crypto` group so
the AES HW's ~54 s Lean-sim latency (128-bit state Signal.val
recursion) doesn't sit behind the ~13 pure-data crypto exes
and blow the per-group timeout.

Group contents:
  rlp-hw-test  merkle-hw-test  hkdf-hw-test
  sha512-hw-test  aes-hw-test  aes-gcm-hw-test
  keccak256-hw-test

Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Crypto IPs (wave 1): HW modules + synth checks for the byte/word FSM tier
…oint)

`Signal.loop` was `opaque` (a wrapper around the `unsafe` memoizing
`loopImpl`), so `(Signal.loop f).val t` could not be unfolded by the
kernel.  The fixpoint equation `loop f = f (loop f)` — the load-
bearing fact for every feedback-circuit proof (e.g. PR #65's RV32
divider) — was therefore a *trusted axiom* (`loop_unfold`).

This replaces the opaque wrapper with a *pure* definition and turns
`loop_unfold` into a proved theorem, removing the custom axiom from
the trust base.

## What changed

`Sparkle/Core/Signal.lean`:
- `loopGo f t` — the fixpoint value at time `t`, by strong recursion:
  apply `f` to a signal that recursively supplies the already-computed
  values for every earlier cycle `i < t` and `default` at `≥ t`.  The
  `if i < t` guard makes it well-founded (`termination_by t`).  Total,
  axiom-free.
- `loop f := ⟨loopGo f⟩` — now a `def`, not `opaque`.  Still carries
  `@[implemented_by loopImpl]`, so *execution* uses the O(n) memoizing
  implementation; only the *logical* value is the pure fixpoint.
- `loopGo_eq` — the defining equation, exposed for proofs.

`Sparkle/Verification/LoopProps.lean`:
- `loop_unfold` goes from `axiom` to `theorem`, proved by rewriting
  through `loopGo_eq` and discharging the `default` placeholder with
  `StrictlyCausal` (a strictly causal body never observes cycles `≥ t`,
  which is exactly why the placeholder is invisible).
- Module docstring updated to describe the pure definition.

## Trust base

`#print axioms dividerSignal_result_gen` (a downstream RV32 divider
theorem) no longer lists `loop_unfold`.  Remaining axioms are all
Lean-standard: `propext`, `Classical.choice`, `Quot.sound`,
`Lean.ofReduceBool` (from `bv_decide`/`native_decide`, pre-existing),
and `Lean.trustCompiler` (from `@[implemented_by]` — the standard
trust that `loopImpl` computes the same value as `loopGo`).  The
bespoke fixpoint axiom is gone.

## Performance

`@[implemented_by loopImpl]` keeps sim fast: a direct
`Signal.loop`-counter samples `.val 2000` in ~0 ms (memoized O(n),
confirmed by probe).  Matched `lake exe test` runs: main 7m52s vs
this branch 8m02s — within noise, no regression.

## Also fixed (surfaced by the first truly-clean build)

`lakefile.lean`'s `sparkleModuleDeps` list was stale:
- `sparkle_Sparkle_Backend_CppSim.so` → `_CSim.so` (CppSim was
  removed / renamed to CSim in the C-only-backend work; the `.so`
  no longer exists, so a clean build's monolithic link failed with
  "-l:...CppSim.so not found").
- Added the missing `sparkle_Sparkle_Core_SignalLeavesDerive.so`
  (the `SignalLeaves` deriving handler, added with the per-leaf
  output-port split, was never registered → clean-build load error
  "undefined symbol: initialize_..._SignalLeavesDerive").
Both were latent bugs on `main` that only bite a from-scratch
`.lake/build` wipe; incremental builds kept the old `.so` around.

## Verification

- `lake build` clean.
- `lake exe test`: 13/13 lspec pass.
- Both `Sparkle.Verification.Divider.{Correct,States33}` build against
  the new pure `loop` with no proof changes needed — PR #65's divider
  proof is now the regression test that `loop = loopGo` is the right
  fixpoint.
…m-smoke)

PR #65 (`c59820c`) added `moreLinkArgs` to the `Sparkle` lean_lib to
force the `sparkle_barrier` / `sparkle_jit` extern archives whole-into
the precompiled `libsparkle_Sparkle.so` — necessary so the `.so`
self-resolves `sparkle_cache_get` / `sparkle_eval_at` (the `@[extern]`
LICM barriers `Signal.loop`'s memoization uses; without it the
interpreter/LSP fails to dlopen the `.so`).

But it used a *relative* path `./.lake/build/c_src`.  `moreLinkArgs`
propagates to every DOWNSTREAM consumer of the `sparkle` package, and
there the relative path resolves against the *downstream's* build dir
— which has no `c_src` — so the link fails:

    ld.lld: error: unable to find library -l:libsparkle_barrier.a
    ld.lld: error: unable to find library -l:libsparkle_jit.a

This is why `downstream-smoke` (ubuntu / macos / wsl) went red on
`main` starting at the PR #65 merge (`9c0e776`); it was green on the
prior commit (`df8560e`).

Fix: compute an ABSOLUTE path from the lakefile's own location
(`__dir__`), captured as `sparkleCSrcDir`.  Whether Sparkle is the
root package (`lake build` here) or a git dependency
(`<downstream>/.lake/packages/sparkle/…`), the path points at
Sparkle's *own* populated `c_src`, so an inheriting downstream link
resolves the archives correctly.

Also simplifies the Linux arm to pass the absolute `.a` paths directly
inside `--whole-archive` (no `-L dir` + `-l:name.so` file search
needed).  The `Sparkle`-lib force_load remains; only the path form
changes.  The IP-lib `sparkleDynlibLinkArgs` (per-module `.so` deps for
the interpreter/LSP force-load path) is unaffected — it's already
per-platform guarded and doesn't reach downstream MyBlinky.

Verification:
- `nm -D libsparkle_Sparkle.so` shows `sparkle_cache_get` /
  `sparkle_eval_at` as `T` (defined) — the `.so` is still
  self-contained.
- `lake build` clean; `lake exe test` 13/13.
- The emitted arg is now e.g.
  `-Wl,--whole-archive /abs/.lake/build/c_src/libsparkle_barrier.a …`.
LoopProps: make loop_unfold a theorem, not an axiom (pure loopGo fixpoint)
goldilocksMulHW — multiply mod the Goldilocks prime 2^64-2^32+1
(the STARK field), as a multi-cycle circuit do FSM.  Behavioural
test checks cycle-by-cycle against the pure-data Goldilocks.mul;
SynthesisChecks section emits #synthesizeVerilog.  64-bit field,
the cheapest of the four field-mul HW modules in this wave.
secp256k1MulHW — multiply mod the secp256k1 field prime
2^256-2^32-977 (Bitcoin/Ethereum curve), multi-cycle circuit do.
Behavioural check vs pure-data Secp256k1Field.mul + #synthesizeVerilog.
p256MulHW — multiply mod the NIST P-256 field prime, multi-cycle
circuit do.  Behavioural check vs pure-data P256Field.mul +
#synthesizeVerilog.
ed25519FieldMulHW — multiply mod 2^255-19, the HW follow-up the
Ed25519Field.lean docstring promised ("the HW engine ... follows
in L.2.b").  Multi-cycle circuit do; behavioural check vs the
pure-data Ed25519Field.mul + #synthesizeVerilog.
BLS signatures on the BLS12-381 pairing-friendly curve — the curve
Ethereum 2.0 consensus (validator aggregate signatures), zk-rollup
proof verification, and threshold-signature schemes are built on.
Sparkle had no BLS / pairing IP before this.

Pure-data (Nat/BitVec, no Signal) reference, layered like the
existing Ed25519 tower:

  Fp   — prime field mod the 381-bit BLS12-381 base prime
  Fp2  — Fp[u]/(u²+1)
  Fp6  — Fp2[v]/(v³-ξ), ξ = u+1
  Fp12 — Fp6[w]/(w²-v)  (the pairing target group GT)
  G1   — E(Fp): y² = x³ + 4
  G2   — E'(Fp2): y² = x³ + 4(u+1)  (the twist)
  Pairing — optimal ate: Miller loop + final exponentiation,
            transcribed line-for-line from Ethereum's py_ecc
            reference (loop parameter, linefunc, final exp).
  sign / verify / aggregate / verifyAggregate

The test (`bls12381-test`) checks: Fp12 pairing bilinearity +
non-degeneracy, sign→verify round-trip PASSES, tampered signature
FAILS, and a 3-signature aggregate over one message verifies.

Scope notes (documented in-file):
- **hash-to-curve is a SIMPLE deterministic placeholder**, not the
  full RFC 9380 `hash_to_curve` suite — enough for the
  sign/verify/aggregate self-consistency the test exercises, but
  NOT interoperable with real Ethereum signatures.  Marked clearly
  so nobody mistakes it for a production H2C.
- **No HW module for the pairing** — 381-bit modmul + a Miller loop
  on an FPGA is a research project, explicitly out of scope.  BLS's
  value here is the machine-checkable reference + IP-catalog
  presence; the tractable HW unit (field multiply) ships separately
  in this wave for the smaller curves.
lakefile.lean: 5 new lean_exe stanzas (bls12381-test,
{goldilocks,secp256k1,p256,ed25519}-mul-hw-test), each
supportInterpreter := true, pointed at Tests/Drivers/*Main.lean.
Tests/AllTests.lean: 5 new imports + .main calls in the release-gate
runner.  ip-tests.yml crypto/crypto-hw groups already cover the
Crypto dir; the new exes ride along there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants