Crypto wave 2: BLS12-381 signature scheme + field-multiply HW#82
Closed
junjihashimoto wants to merge 985 commits into
Closed
Crypto wave 2: BLS12-381 signature scheme + field-multiply HW#82junjihashimoto wants to merge 985 commits into
junjihashimoto wants to merge 985 commits into
Conversation
Adds `mixedWidthMonad` / `mixedWidthMacro` — a BitVec 4 counter
and a BitVec 8 accumulator running through the same v2 monad
state. The Prod-chain state packs to 12 bits (4 + 8); each
register slot is sliced out by width-aware lens reads.
This is the headline benefit of the v2 monad over v1's
`Vector τ n` state: v1 forced one element type per circuit, v2
mixes arbitrary Wireable types in a single body. Without a
test pinning this down, the next person changing the elaborator
could silently lose the heterogeneous-width property.
Result Verilog (mixedWidthMonad):
logic [11:0] _tmp_loop_0; // packed state
assign _tmp_op_a_2 = _tmp_loop_0[11:8]; // cnt slot
assign _tmp_op_a_8 = _tmp_loop_0[7:0]; // acc slot
Sim parity verified end-to-end:
--- mixedWidthMonad ≡ mixedWidthMacro ---
got = [0x00, 0x00, 0x01, 0x03, 0x06, 0x0a, 0x0f, 0x15]
expected = [0x00, 0x00, 0x01, 0x03, 0x06, 0x0a, 0x0f, 0x15]
PASS
The extension `0#4 ++ count` is used rather than `zeroExtend`
because `BitVec.zeroExtend` doesn't yet have an
elaborator wire-translation rule. Once it does, the test can
move to the more idiomatic form.
Branch poc/circuit-monad-v2.
Hand-written generalisation from runCircuit2 to runCircuit3. State shape is `τ₀ × τ₁ × τ₂` (right-nested Prod chain), slot lenses use depth-2 `Signal.map Prod.fst/snd` chains for the middle and tail slots. Same recipe as runCircuit2 — both halves of the chain lower through the elaborator's existing Prod recognition rules (no new rules needed). Adds tripleCountMonad regression test (three BitVec 8 counters, output = a ^^^ b ^^^ c). $ lake exe circuit-monad-v2-test --- counterMonad ≡ counterMacro --- PASS --- twoCountMonad ≡ twoCountMacro --- PASS --- mixedWidthMonad ≡ mixedWidthMacro --- PASS --- tripleCountMonad ≡ tripleCountMacro --- PASS Status of arity generalisation: the per-arity helper pattern scales further with hand-written runCircuit4 / runCircuit5 copies, or — more principled — a macro that emits arity-N forms. The macro path is the natural follow-up for handling PR #17's N-client arbiter where N is a parameter. Branch poc/circuit-monad-v2.
`Tests/AllTests.lean` already drives `Sparkle.Tests.SignalLoopTest.main` / `CircuitIfTest.main` / `CircuitMatchTest.main` from its main, so a cycle-by-cycle divergence in any of those aborts `lake test`. Add `Sparkle.Tests.CircuitMonadV2Test.main` to the same chain so the v2 monad surface (counterMonad / twoCountMonad / mixedWidthMonad / tripleCountMonad — all four sim-vs-macro parity checks) is exercised on every `lake test` invocation. `lake test` exits 0 with the new entry; the v2 sims show up as a fourth ALL PASS block between the existing CircuitMatchTest output and the lspecIO runner. Branch poc/circuit-monad-v2.
Introduces `circuit do { ... }` — the v2-monad-backed alternative
to `Signal.circuit do { ... }`. Same surface syntax:
- `let r ← Signal.reg init` register declaration
- `r <~ rhs` next-cycle update
- `let x := expr` branch-local Lean binding
- `if cond then ... else ...` Signal-Bool branch over multiple registers
- `return expr` output Signal
Lowers to `Sparkle.Core.runCircuit{N}` (counting the `Signal.reg`
declarations) with a body that uses `Sparkle.Core.Circuit.bind` /
`Sparkle.Core.Circuit.pure'` directly — so it goes through the
`handleCircuitMonad` recognition added earlier in this branch and
reaches `#synthesizeVerilog` for free.
Sim parity confirmed against the legacy `Signal.circuit do`
macro for all four test patterns (counter, reset counter,
two-register reset, hold semantics):
$ lake exe circuit-do-test
--- counterCdo ≡ counterMacro --- PASS
--- resetCounterCdo ≡ resetCounterMacro --- PASS
--- twoRegResetCdo ≡ twoRegResetMacro --- PASS
--- heldRegCdo ≡ heldRegMacro --- PASS
All eight `#synthesizeVerilog` invocations in `Tests/CircuitDoTest.lean`
succeed; the cdo Verilog matches the macro reference structurally
(same registers, same mux trees).
To make `cnt + 1#8` work the way it does inside the macro
(where `cnt` is implicitly a `Signal dom τ`), `Reg dom S τ`
gets a `CoeHead` instance to `Signal dom τ`. Without this the
user has to write `Circuit.read cnt + 1#8` everywhere, defeating
the goal of macro-parity UX.
Also wires `circuit-do-test` into `Tests/AllTests.lean` so the
sim parity is checked on every `lake test` invocation. Drops
some exploratory definitions from `CircuitMonadV2Test.lean`
that were superseded by the cdo form.
Branch poc/circuit-monad-v2. Open follow-ups (tracked):
- statement-level `match` in `circuit do`
- duplicate `<~` detection
- migrating existing users of `Signal.circuit do` to `circuit do`
- removing the legacy macro
…rloads
`Sparkle/Core/CircuitDo.lean` gains a statement-level `match`
lowering that mirrors the legacy macro's `Signal.mux`-chain
form: per-arm Signal.mux on `scrut === pat`, wildcard arm as
tail fallthrough, hold semantics for registers not assigned in
some arms. Code path is implemented but not yet
production-ready — see deferred status below.
`Sparkle/Core/CircuitMonad.lean` gains:
* `CoeHead (Reg dom S τ) (Signal dom τ)` so the register
handle is implicitly a Signal anywhere a Signal is
expected (e.g. `Signal.mux cnt …`).
* `HAdd/HSub/HMul/HXor` instances for `Reg + BitVec`, `Reg +
Reg`, `Signal + Reg` etc. Lean resolves binary operators
by the operand types first, so `CoeHead` doesn't fire on
`cnt + 1#8`; explicit instances close that gap. Mirrors
the existing `Signal × BitVec` mixed-arity instance pattern
in `Signal.lean`.
* Trial `nextAny` polymorphic register write + `AsSignal`
type class were prototyped but reverted — the type class
indirection didn't reduce through `handleCircuitMonad`'s
`withTransparency .all $ whnf` path for synthesis.
`Tests/CircuitDoTest.lean` is unchanged for the if/else tests
(counter / resetCounter / twoRegReset / heldReg — all still
pass). Match tests deferred.
Open: Task 317 — `circuit do` match arms.
The macro implementation emits
`Signal.mux ($scrut === $pat) $armRhs $chained`
where `$scrut` is a `Reg` (because `state` in user code
binds to a Reg). Lean fails to unify the type argument of
`Signal.beq` because the elaborator can't decide whether to
coerce `state` (Reg → Signal) or `0#2` (BitVec → Signal)
first, and the inner `$armRhs` is a bare BitVec literal whose
type only gets pinned down once the whole mux chain is
resolved.
Two viable fixes for the next pass:
(a) Have the macro emit `state.1 === pat` (explicit Signal
projection on the scrutinee), removing one Coe step.
(b) Add an instance-resolution pattern that pins the
`Signal.beq` α from the scrutinee side preferentially.
The legacy `Signal.circuit do` macro avoids the issue because
`state` is bound via `let state := projN! _circuit_state n 0`
— it's already a Signal at that point. v2's `Reg` boundary
is more visible to the user and helpful for type safety
(heterogeneous state) but interacts awkwardly with bare-value
RHS arms in `match`.
Branch poc/circuit-monad-v2.
The match arm RHS that previously stalled on `?m state` metavariables now elaborates cleanly. The fix: in the macro, emit the scrutinee Signal as `$scrut |>.1` — the explicit `.1` projects out the underlying `Signal dom τ` from the user-bound `Reg dom S τ` so Lean's type inference pins `τ` immediately, before the per-arm pattern literals are elaborated. Removed the `CdoScrut` typeclass / `cdoScrut` indirection I'd trialled — the typeclass dispatch survived to synthesis and made `Signal.beq` un-inlinable from Sparkle's side. The direct `.1` projection avoids both problems. $ lake exe circuit-do-test --- counterCdo ≡ counterMacro --- PASS --- resetCounterCdo ≡ resetCounterMacro --- PASS --- twoRegResetCdo ≡ twoRegResetMacro --- PASS --- heldRegCdo ≡ heldRegMacro --- PASS --- fsm3Cdo ≡ fsm3Macro --- PASS ← match --- fsmHoldCdo ≡ fsmHoldMacro --- PASS ← match + hold Both new FSM tests synthesise (`#synthesizeVerilog fsm3Cdo`, `fsmHoldCdo`) and produce the same shape Verilog as their `Signal.circuit do` macro counterparts. Trade-off: scrutinees now must be `Reg`s (the `.1` is hardcoded), not arbitrary Signal expressions. Users wanting to match on an ad-hoc Signal can precompute it into a Reg or write the `Signal.mux` chain by hand — same workaround as the legacy macro requires for arbitrary expressions. Branch poc/circuit-monad-v2.
Adds the same kind of macro-level duplicate-write check the
legacy `Signal.circuit do` macro already does. When the user
writes the same register's `<~` twice at the same statement
level — e.g.
circuit do
let cnt ← Signal.reg 0#8
cnt <~ 0#8
cnt <~ cnt + 1#8 -- silently overwrites the first
return cnt
the macro now throws:
circuit do: register `cnt` is assigned with `<~` more than
once at the same statement level — last write wins (matches
Verilog `always_ff` semantics); merge the assignments into
a single `<~` to silence this error.
The check runs after `flattenCdoStmts` has collapsed any `if/
else` and `match` branches into single muxed writes per
register, so it only fires on duplicates the user actually
wrote at the top level (the flattener's own muxed emissions
emit at most one row per register).
Verified with `#guard_msgs` in `Tests/CircuitDoTest.lean`:
/-- error: circuit do: register `cnt` is assigned with `<~`
more than once …
-/
#guard_msgs in
example : Signal defaultDomain (BitVec 8) :=
circuit do
let cnt ← Signal.reg 0#8
cnt <~ 0#8
cnt <~ cnt + 1#8
return cnt
Note: the duplicate-write `example` had to use `example`
rather than `def` — a `def` whose macro errored out leaves a
`sorry`-typed body that ran the test harness into an
"INTERNAL PANIC: executed 'sorry'" on `lake exe`. `example`
discards the body so the panic doesn't surface.
Branch poc/circuit-monad-v2.
…t do (Task 319, part 1)
Replaces uses of the legacy `Signal.circuit do` macro with the
v2 `circuit do` macro across non-test production code, the
tutorial-extended demos, and Ch 3 of the notebook solutions.
Each file's sim semantics are unchanged; the macro emits the
same Verilog through the v2 lowering (runCircuit{1,2,3} →
Signal.loop / Signal.register chain).
Migrated:
* `Sparkle/Display/Synthesise.lean` — comment example only.
* `Tests/Circuit/SimTest.lean` — counter + 2-reg pipeline.
`lake exe circuit-sim-test` still PASSes.
* `Tests/Tutorial/HierarchyTest.lean` — `latch8` /
`latch8mod` for the inline-vs-`@[hardware_module]` split
coverage. `lake exe tutorial-hierarchy` still PASSes.
* `tutorial-extended/TutorialExtended/Step8_CircuitDoNotation.lean`
— all four examples (counter, up/down, shift pipeline,
enabled counter). Prose updated to point at the new
macro.
* `tutorial-extended/TutorialExtended/Run.lean` — comment
string only.
* `docs/tutorial/Notebooks/Solutions/Ch03.lean` — traffic-
light FSM. Required a manual `let stateSig := state.1;
let timerSig := timer.1` projection because the body
passes `state` directly to `Signal.mux` arms where Lean's
Coe can't pin down `α` automatically (the rest of the body
uses the projected `Signal` aliases). This is the same
"register-handle as argument" pattern the v2 `match`
macro works around with an explicit `.1`.
Macro side improvements that enabled the migration:
* `circuit do` top-level `let _ := _` bindings now stay in
scope for every subsequent `<~` and the trailing `return`
— previously each `<~` cleared the binding list, so a
`let isGreen := state === S_GREEN` defined at the top
wasn't visible to later assignments. Branch-local lets
inside if/match arms keep their existing arm-scoped
semantics (handled separately by the flattener).
* Added a `CoeOut (Reg dom S τ) (Signal dom τ)` instance
alongside the existing `CoeHead`. Doesn't fully resolve
the `Signal.mux Reg BitVec` metavariable case (Ch 3
needed the explicit `.1` regardless), but it widens the
coercion firing window for other call sites.
Files NOT migrated yet (deferred to Task 320 since they're
legacy-macro regression tests):
* `Tests/CircuitIfTest.lean` — if/else macro behaviour.
* `Tests/CircuitMatchTest.lean` — match/case macro.
* `Tests/SignalLoopTest.lean` — register loop semantics.
These will be removed when the legacy macro itself is removed;
the equivalent v2 coverage already lives in
`Tests/CircuitDoTest.lean` and `Tests/CircuitMonadV2Test.lean`.
Verified:
$ lake test
EXIT=0 (all suites pass, including the migrated files)
Branch poc/circuit-monad-v2.
…SL (Task 320)
Wraps up the v2 migration: deletes the legacy `Signal.circuit
do` macro from `Sparkle/Core/Signal.lean` and the regression
tests that were specific to it, since `circuit do`
(`Sparkle/Core/CircuitDo.lean`, on top of the v2 monad in
`Sparkle/Core/CircuitMonad.lean`) now covers every feature with
identical Verilog output.
Removed:
* `Tests/CircuitIfTest.lean` — covered by `CircuitDoTest`
(resetCounterCdo / twoRegResetCdo
/ heldRegCdo, all with `if/else`
lowering).
* `Tests/CircuitMatchTest.lean` — covered by `CircuitDoTest`
(fsm3Cdo / fsmHoldCdo, the
match/case lowering).
* `Tests/Drivers/CircuitIfTestMain.lean`,
`Tests/Drivers/CircuitMatchTestMain.lean` — driver wrappers
for the removed exes.
* lakefile `circuit-if-test`, `circuit-match-test` lean_exe
declarations.
* `Tests/AllTests.lean` references and the
Sparkle.Tests.CircuitIfTest.main / CircuitMatchTest.main
calls.
* `Sparkle/Core/Signal.lean` `Signal.circuit do` DSL block:
`circuitStmt` / `circuitMatchArm` syntax categories,
`collectBranchAssigns` / `flattenCircuitStmts`,
`macro_rules` for the `Signal.circuit do $stmts*` form.
~390 lines removed. `projN!` / `bundleAll!` stay — they're
used by `Sparkle/Core/StateMacro.lean` and many IPs.
Rewritten:
* `Tests/SignalLoopTest.lean` — the `Signal.circuit do`-side
reference circuits now use `circuit do`; the raw
`Signal.loop`/`Signal.register` parity check is otherwise
unchanged.
* `Tests/CircuitDoTest.lean` — the cdo-vs-macro parity tests
became cdo-vs-hand-written-reference-list tests (no more
legacy macro on the rhs of the comparison). Same six
sub-tests (counter, resetCounter, twoRegReset, heldReg,
fsm3, fsmHold) plus the duplicate-`<~` guard.
* `Tests/CircuitMonadV2Test.lean` — `*Macro` references
renamed to `*Cdo` and bodies switched to `circuit do`.
The raw `runCircuit{1,2,3}` form is the part actually
being tested; cdo is the reference.
* `Sparkle/Core/CircuitDo.lean` / `Sparkle/Core/CircuitMonad.lean`
docstrings updated to reflect the macro's removal (mention
it as the design predecessor that motivated the v2 split,
not as a still-available alternative).
Build & sim:
$ lake build test
Build completed successfully (355 jobs)
$ lake exe test
EXIT=0 (all suites pass, including the migrated tests)
The cdo macro is now the only register-DSL surface in Sparkle.
v2 monad's `runCircuit{N}` helpers remain the lower-level
escape hatch for users who want explicit monad plumbing
(`forM`, `traverse`, etc.).
Branch poc/circuit-monad-v2.
Adds `Tests/CircuitMonadForMTest.lean` — a small test that
exercises Lean's standard `List.forM` directly over a list of
`Reg` handles, calling `Circuit.next` on each. Demonstrates a
capability the removed `Signal.circuit do` macro couldn't
express (it was syntax-level; `forM` would have to be in its
recognized statement grammar to work). v2's `Circuit` monad
is a real `Monad`, so `forM` / `mapM` / `traverse` / any other
`Bind`-based combinator goes through Sparkle's
`handleCircuitMonad` recognition and reaches synthesis.
The test circuit declares three `BitVec 8` registers via
`runCircuit3` and increments all three using `[r0, r1, r2].forM
(fun r => Circuit.next r (Circuit.read r + 1#8))`. Output is
the sum of the three values each cycle.
Coverage:
* Simulation — `lake exe test` runs the namespaced
`Sparkle.Tests.CircuitMonadForMTest.main` and checks the
cycle-by-cycle output against the expected list
[3, 6, 9, 12, 15, 18].
* Synthesis — `#synthesizeVerilog threeCountersForM` runs at
`lake build Tests.CircuitMonadForMTest` time. Verified
that the generated Verilog packs the three registers into
a 24-bit packed state with the expected
`always_ff @(posedge clk)` blocks (DRC warning on the
combinational output is the usual one for `return a + b + c`
and matches the equivalent shape on macro-generated
versions).
Standalone exe: `lake exe circuit-monad-forM-test` for ad-hoc
debugging. Also wired into `lake test` via the AllTests
import / `Sparkle.Tests.CircuitMonadForMTest.main` call so it
runs on every `lake test`.
Branch poc/circuit-monad-v2.
Extends the imperative-HW chapter with a `forM` example that
exercises a capability the legacy `Signal.circuit do` macro
couldn't provide. `circuit do` is a syntax-level macro that
intercepts the `do` keyword (so `forM` inside its body would
be parsed by the cdo macro, not by Lean's monad elaborator);
to use `forM` we drop down to the v2 helper `runCircuit3`,
which gives a real `Monad (Circuit dom S)` instance and
composes with everything Lean's `ForM` /
`Traversable` ecosystem offers.
Concretely:
def threeCountersForM : Signal defaultDomain (BitVec 8) :=
runCircuit3 0#8 1#8 2#8 (fun r0 r1 r2 => do
[r0, r1, r2].forM (fun r =>
Circuit.next r (Circuit.read r + 1#8))
return Circuit.read r0 + Circuit.read r1 + Circuit.read r2)
Verified at the tutorial demo level:
$ lake exe tutorial-extended-run | grep "forM"
forM (3 regs) : [0x03#8, 0x06#8, 0x09#8, 0x0c#8, 0x0f#8, 0x12#8]
The narration walks through why the legacy macro couldn't do
this and points readers at `runCircuit{N}` when they want the
full Lean monad surface. Synthesis side is covered by
`Tests/CircuitMonadForMTest.lean`'s `#synthesizeVerilog`.
Branch poc/circuit-monad-v2.
Extends the arity wall from 3 to 4. `runCircuit4` follows the
same Prod-chain construction as `runCircuit3`, just with one
more `Prod.snd` step in each slot lens to reach the deepest
tail. No new elaborator rules needed — the existing
`Signal.map Prod.fst` / `Prod.snd` recognition handles depth-3
projections automatically.
`circuit do` also accepts N=4 now (cdo macro extends to call
`runCircuit4` when the body has four `let _ ← Signal.reg`).
Coverage added:
* `Tests/CircuitDoTest.lean` `fourCounterCdo` — four BitVec
8 counters incrementing by 1/2/3/4, sum returned. Sim
and synth both green.
* `Tests/CircuitMonadForMTest.lean` `shift4ForM` — 4-stage
shift register written via `runCircuit4` + `forM` over a
list of (dst, src) pairs. Demonstrates that `forM` scales
to the 4-register arity.
Result Verilog for `shift4ForM` packs the four registers into
a 32-bit state with the expected `always_ff @(posedge clk)`
shift chain.
Status of full Task 314 (truly arbitrary N via HList state):
the `mkRegList` recursion over `αs : List Type` runs into
dependent-type juggling (the lens passed through tail recursion
needs to stay typed against the outer HList shape, not the
shrinking inner one). Hand-rolling `runCircuitN` for N ≤ 4 is
simpler and good enough for the immediate use cases; truly
arbitrary N can wait until a concrete need (e.g. 8+ stage
pipeline) materialises, and the macro-based path
(`runCircuitN` as a Lean macro expanding to a hand-rolled form)
is the likely route.
Also reverted an experimental `SlotLens` / `headLens` /
`tailLens` block from `Sparkle/Core/Wireable.lean` that was
introduced during the dependent-type attempt — it's not used
by the N=4 hand-rolled form and would have become dead code
otherwise.
Branch poc/circuit-monad-v2.
Adds `runCircuitH` — a single helper that takes a heterogeneous
list of initial values (`HList αs`) and a `RegList`-receiving
body, replacing the per-arity wall at `runCircuit{1,2,3,4}`.
Pieces:
* `Sparkle/Core/Wireable.lean`
- `HListWireable αs` typeclass + auto-derived instances
from `Wireable α` for each element. Constrains the slot
types at the `runCircuitH` call site so non-synthesisable
types (e.g. `Option Nat`) are rejected up front.
- `Wireable (HList αs)` instance by induction over the
Prod chain.
* `Sparkle/Core/CircuitMonad.lean`
- `RegList dom S αs` — list of register handles, one per
slot, all carrying the same outer state `S`.
- `mkRegList` — builds the `RegList` from a Signal-level
lens pair by composing `Prod.fst`/`Prod.snd` accessors
at each depth.
- `packRegister` — closes the loop with `Signal.register` /
`bundle2` chain on the `αs` list.
- `runCircuitH` — `Signal.loop` + `mkRegList` + `packRegister`
glued together; `[HListWireable αs]` constraint enforces
slot-type validity.
* `Sparkle/Compiler/Elab.lean`
- `inferHWType` learns to handle `Unit` / `PUnit` (the
HList terminator), returning `.bitVector 0` so the
surrounding `Prod` width accumulation sums correctly.
- `Signal.pure ()` / `Signal.pure ⟨⟩` synth case added —
`packRegister []` lands on `Signal.pure ()`, which gets
a 0-bit constant wire.
Tests (sim + synth):
$ lake exe run-circuit-h-test
--- counterH --- PASS (N=1, BitVec 8)
--- twoCountH --- PASS (N=2, BitVec 8 + BitVec 8)
--- mixedWidthH --- PASS (N=2, BitVec 4 + BitVec 8 — heterogeneous)
ALL PASS
Each test's `#synthesizeVerilog` succeeds. The generated
Verilog for `mixedWidthH` packs the state into a 12-bit packed
register chain just like the hand-rolled `runCircuit2` form did,
with no extra Verilog cost from the HList machinery.
Status of Task 314 (any-arity): N=1,2 confirmed working; N=3+
should follow by construction from the `mkRegList` / `packRegister`
recursions but isn't yet covered by a dedicated test. The
existing `runCircuit{1,2,3,4}` helpers stay in place for now;
folding all call sites onto `runCircuitH` is a follow-up.
Branch poc/circuit-monad-v2.
The per-arity helpers (`runCircuit1`/`2`/`3`/`4`) are gone;
`runCircuitH` takes an `HList αs` of initial values and a body
that receives a `RegList` Prod chain. The `circuit do` macro
now lowers to a single shape regardless of N: build the
HList inits expression, build the let-chain to destructure
the `RegList` into named `r0`, `r1`, … handles, and call
`runCircuitH`.
Removed:
* `Sparkle/Core/CircuitMonad.lean` runCircuit1/2/3/4
definitions (~188 lines).
* `Tests/CircuitMonadV2Test.lean` + driver — folded into
`Tests/RunCircuitHTest.lean`, which now covers
N=1/2/3/4, mixed-width, and forM.
* `Tests/CircuitMonadForMTest.lean` + driver — likewise
superseded by `RunCircuitHTest`'s `threeCountForM` case.
* `lake exe circuit-monad-v2-test` and `circuit-monad-forM-test`
from `lakefile.lean`.
Updated:
* `Sparkle/Core/CircuitDo.lean` macro emits
`runCircuitH (αs := [_, _, …, _])` with explicit list of
`_`-typed slot types so Lean's HList inference resolves
fully from the inits. Body destructures via a `let
(r0, rest1) := _cdoRegs; let (r1, rest2) := rest1; …`
chain.
* `Tests/RunCircuitHTest.lean` adds tripleCountH (N=3 XOR),
fourCountH (N=4 sum), threeCountForM (forM over a
destructured list). All sim PASS, all
`#synthesizeVerilog` succeed.
* `tutorial-extended/.../Step8_CircuitDoNotation.lean`
`threeCountersForM` switched to `runCircuitH`.
The Verilog generated for these tests is unchanged from the
hand-rolled `runCircuit{N}` versions — `HList αs` reduces to
the same Prod chain, so the IR elaborator lowers it identically.
Branch poc/circuit-monad-v2.
…`import Sparkle`
Two CI-fix changes:
1. `Sparkle.lean` (the root module re-exporter) now imports
`Sparkle.Core.CircuitMonad` and `Sparkle.Core.CircuitDo`.
Without these, `import Sparkle` doesn't bring the `circuit
do` macro into scope, so user code (and the tutorial) needs
extra `import` lines. Adding them to the root keeps the
`import Sparkle` UX consistent across legacy and v2.
2. `docs/Tutorial.md` switches from the removed legacy
`Signal.circuit do` macro to the v2 `circuit do` macro
introduced in this branch (Task 320). Two code blocks
(counter8' and shiftPipeline) were dying with
`unexpected token '~'` because `<~` lives only inside the
`circuit do` macro now.
Verified:
$ bash scripts/check_tutorial.sh
All tutorial blocks type-check cleanly.
The pre-existing `evalSignalAt` runtime errors on block_00 /
block_01 (which run `#eval ... .sample`) are explicitly
filtered out by the check script via `grep -v "Could not find
native implementation"` — those are FFI runtime issues unrelated
to the markdown.
Branch poc/circuit-monad-v2.
…apters
Task 320 removed the legacy `Signal.circuit do` macro; the v2
`circuit do` macro replaces it. 21 prose mentions and code
blocks across 10 chapter files were still referring to the
removed macro, which left their code blocks failing
`scripts/check_tutorial.sh` with `unexpected token '<~'`.
Mechanical sed replacement:
for f in docs/tutorial/md/*.md; do
sed -i 's/Signal\.circuit do/circuit do/g' "$f"
done
Result on `bash scripts/check_tutorial.sh <ch>.md` (PASS = no
elaboration errors, ignoring `Could not find native
implementation` runtime issues which the script filters):
before → 8 chapters PASS / 6 FAIL (main)
after → 9 chapters PASS / 5 FAIL (this branch, after the rename)
Ch08_Yosys.md moves from FAIL to PASS as a direct result of
the rename. The remaining failing chapters
(Ch01b_YourFirstProject, Ch02_Combinational, Ch03_Sequential,
Ch04_Modules, Ch05_Verilog, Ch06_LTL, Ch07_Equivalence,
Ch07b_FpQuantEquivalence, Ch08b_Simulation, Ch09_FPGA,
Ch10_Architecture) all fail on issues that predate this branch
(stray `end`s, unknown `Display.*` identifiers, `#`-prefixed
commands as top-level statements, etc.) — none of them mention
`<~` or the legacy macro after this commit, so the v2 migration
is not the cause.
Branch poc/circuit-monad-v2.
…03/Ch09
Two follow-up fixes for the `lake build TutorialNotebooks` job
that runs via xlean-convert against `docs/tutorial/md/Ch*.md`:
1. Add `HAnd`/`HOr`/`HXor` instances for
`Reg dom S (BitVec n)` operands (Reg × BitVec, Reg × Reg)
to `Sparkle/Core/CircuitMonad.lean`. The Ch09 blinky
example does `count &&& 0x800000#24` with `count : Reg`,
which fell back to `HAnd (Reg …) (BitVec n) ?` and
failed instance resolution. Mirrors the existing
`HAdd`/`HSub`/`HMul`/`HXor` patterns for the same operand
mixes.
2. In `Ch03_Sequential.md` (`fsm`, `counterEn`) and
`Ch09_FPGA.md` (`blinky`), project the register handle to
its underlying `Signal` (`let countSig := count.1`)
before mixing it into `Signal.mux` / `===` chains.
Same workaround the v2 test files use — `Signal.mux ? ?
reg` doesn't resolve the `Coe Reg → Signal` because both
branches need coercion and `α` is unset. The
`Signal.lit dom CONST` ascription pattern from `Ch03`'s
traffic-light example is also used here for the bare
`BitVec` arms of the next-state mux.
After:
$ XLEAN=…/xlean-convert bash docs/tutorial/build-from-md.sh
$ lake build TutorialNotebooks
Build completed successfully (70 jobs)
$ lake exe test
EXIT=0
Branch poc/circuit-monad-v2.
…ing the build)
CI was running \`lake update\` in both the \`build\` and
\`tutorial-notebooks\` jobs. That re-resolves \`require LSpec
from git ... @ "main"\`, which now points at an LSpec commit
(d3c15b93, 2026-04-15) that bumps to Lean v4.29. Lake then
*uplifts the toolchain* from our pinned \`v4.28.0\` to the
newer one the dependency demands, and CI runs end up
elaborating LSpec under Lean v4.31.0-rc1 — where a few
\`instance\` declarations in \`LSpec/LSpec.lean\` now flag as
"argument cannot be inferred using typeclass synthesis".
The build log made this look like a \`verilog-tests\`
failure because \`lake exe verilog-tests\` is the first step
that actually invokes the freshly-rebuilt LSpec. In fact
the project itself is healthy; only the toolchain drift
broke it.
The committed \`lake-manifest.json\` already pins LSpec
(\`dc0904293d30\`, 2026-03-06) to a commit that's compatible
with our toolchain. Dropping \`lake update\` makes CI honour
that pin. Reproducible builds 101.
If we ever do want to bump LSpec or any other dep, the
intended workflow is:
1. Bump \`lean-toolchain\` locally to whatever the new
LSpec requires.
2. Run \`lake update\` locally to refresh
\`lake-manifest.json\`.
3. Verify the build, commit both files together.
…rather than implicitly bumping via CI on every push.
Verified locally that \`lake exe verilog-tests\` and
\`lake build TutorialNotebooks\` work without \`lake update\`.
Branch poc/circuit-monad-v2.
The new \`HAdd (Reg …) …\` / \`HXor (Reg …) …\` operator instances introduced in \`Sparkle/Core/CircuitMonad.lean\` bind their arguments as plain \`a b\` lambdas inside instance definitions. When the elaborator later unfolds those instances for synthesis, Lean's macro-hygiene machinery produces identifier names like: x__@_Sparkle_Core_Signal_2408276647__hygCtx__hyg_45 These reach \`freshName\` as the hint argument and get embedded into \`_gen_\` wire names verbatim — but \`@\` isn't a valid SystemVerilog identifier character, so Verilator chokes: %Error: generated_soc.sv:7942:25: syntax error, unexpected '@' 7942 | logic [0:0] _gen_x__@_Sparkle_Core_Signal_… Fix: in \`freshName\`, strip the hint at the first \`@\` (after trimming a trailing \`_\` so \`_@…\` doesn't leave a trailing underscore). Lean's hygiene suffix uses the exact pattern \`<name>__@_<module>__hygCtx__hyg_<N>\`, and \`@\` never appears in user-facing identifiers; the cut is unambiguous. Verified: $ lake build IP.RV32.SoCVerilog Build completed successfully (129 jobs). $ cd verilator && make build ... - Verilator: Walltime 4.141 s (clean Vrv32i_soc binary, no Verilog syntax errors) Branch poc/circuit-monad-v2.
Add Sparkle.Core.CircuitMonad to make circuit statements true monads.
…nputs alone
The previous implementation had two correctness bugs that
showed up only on CI:
1. A name without any `@` was still passed through
\`endsWith "_"\` + \`dropRight 1\`, which silently
truncates a legitimate trailing underscore (e.g. \`x_\`
→ \`x\`). Now we early-return the input unchanged
unless an \`@\` is actually present.
2. When stripping, only **one** trailing underscore was
dropped. Lean's hygiene marker uses \`__@\` (two
underscores), so \`x__@…\` was getting cut to \`x_\`,
not \`x\`. That left \`x_\` as the visible name, which
could collide with another legitimate \`x_\`-prefixed
wire downstream — exactly the kind of silent rename
that flips a JIT counter from "increments each tick"
to "stays at 0", as the CI \`svparser-test\` Test 6
was showing. Now we drop *every* trailing \`_\` from
the prefix.
Verified locally that \`lake exe svparser-test\` now
reports \`Test 6: E2E JIT simulation... PASS (count=9
after 10 cycles)\` (29/29 passing) and that the
RV32 SoCVerilog output is still hygiene-free.
Branch poc/circuit-monad-v2.
…pstream Upstream xeus-lean no longer ships \`build.sh\` at its repo root; the kernel build is now driven by Makefile + CMakeLists + pixi.toml with non-trivial emscripten / xtl / xwidgets prerequisites. Our \`Mode 2\` step in \`tutorial-notebooks\` was running cd /tmp/xeus-lean && bash build.sh which fails immediately with \`No such file or directory\`, showing up as an annotation \`error\` on every CI run. The step was \`continue-on-error: true\` so the job stayed green overall, but the noise was unhelpful. Mode 1 (\`lake build TutorialNotebooks\` against the Display shim in \`docs/tutorial/Display.lean\`) is the bedrock check and already covers every chapter's code blocks. Running the generated notebooks under the *real* xeus-lean kernel would catch additional drift (real \`Display.waveform\` / \`#mermaid\` implementations), but it's a separate kind of test that deserves its own CI pipeline — when xeus-lean's build story stabilises. Branch poc/circuit-monad-v2 / merge \#35 → main.
Fix the build of xeus-lean
…ages \`https://verilean.github.io/sparkle/\` was 404 because the \`gh-pages\` branch only contained \`dev/*\` (benchmark dashboards published by \`benchmark-action/github-action-benchmark\`) — no top-level \`index.html\` and no API / tutorial content. New steps under the existing \`tutorial-notebooks\` job (only on \`push\` to \`main\`): 1. \`Doc: build doc-gen4 API reference\` — runs \`lake build Sparkle:docs\` to render every module reachable from the root \`Sparkle.lean\` into \`.lake/build/doc/\`. 2. \`Doc: convert tutorial notebooks to HTML\` — \`jupyter nbconvert --to html --template classic\` for each \`docs/tutorial/Notebooks/Gen/notebooks/*.ipynb\`, output into \`site/tutorial/\`. 3. \`Doc: assemble site\` — copies the doc-gen4 output to \`site/api/\` and writes two hand-crafted index pages: \`site/index.html\` (landing — tutorial / API / benchmark links + project blurb) and \`site/tutorial/index.html\` (chapter list). Plain inline CSS, no extra theme deps. 4. \`Doc: publish to gh-pages\` — \`peaceiris/actions-gh-pages@v3\` with \`keep_files: true\` so the benchmark dashboards under \`gh-pages/dev/*\` keep working alongside the new \`/\`, \`/api/\`, and \`/tutorial/\` paths. The whole block is gated on \`github.event_name == 'push' && github.ref == 'refs/heads/main'\`, so PRs don't churn the gh-pages branch. Expected after the next main push: https://verilean.github.io/sparkle/ → landing https://verilean.github.io/sparkle/api/ → doc-gen4 https://verilean.github.io/sparkle/tutorial/ → chapter HTMLs https://verilean.github.io/sparkle/dev/... → benchmarks (unchanged)
Fix CI to deploy tutorials
The `Doc: convert tutorial notebooks to HTML` step was failing with `jupyter: command not found` (exit 127) because no pip step ever installed it. The prior commit's comment incorrectly claimed an "earlier pip step" — there was none. Add an explicit install step right after `Install Lean toolchain` and put ~/.local/bin on $GITHUB_PATH so subsequent steps see the binary.
… branch) Switch GitHub Pages deployment from the legacy gh-pages-branch flow (peaceiris/actions-gh-pages + benchmark-action auto-push) to the workflow-based actions/deploy-pages@v4 path: - 3× benchmark-action calls now use `external-data-json-path` (writes JSON to bench-data/, no git branch touched). - `build` job uploads `bench-data/*.json` as an artifact. - `tutorial-notebooks` job downloads the bench artifact, generates a `site/dev/<bench>/index.html` per suite (latest-run snapshot; history is intentionally not preserved), then publishes the full `site/` tree via actions/upload-pages-artifact + actions/deploy-pages. - Required permissions (pages: write, id-token: write) and the github-pages environment added to the job. Effect: `gh-pages` branch is no longer written to, so it can be deleted; user clones stay lean. Pages source must be set to "GitHub Actions" in repo settings (done via API).
The previous attempt put `environment: github-pages` directly on the
`tutorial-notebooks` job, which made GitHub evaluate the environment's
deployment branch policy on every run — including PR runs from
`poc/circuit-monad-v2`, which got rejected with:
Branch "refs/pull/39/merge" is not allowed to deploy to github-pages
due to environment protection rules.
Move just the `actions/deploy-pages@v4` step into a new
`pages-deploy` job gated by
`if: github.event_name == 'push' && github.ref == 'refs/heads/main'`.
The environment binding lives only on that job, so PR runs skip it
entirely and never trigger the protection rule.
The assemble work stays in `tutorial-notebooks`, which now ends with
`actions/upload-pages-artifact@v3`; `pages-deploy` only runs
`deploy-pages` against that artifact.
- IP/Bus/SBUSHW.lean: byte-serial S.BUS frame accumulator (`frameAccumulatorHW`). Tracks a 5-bit byte-index counter (0..24), latches byte 0 as the header check, byte 24 as the footer check, and exposes an 11-bit channel-0 view assembled from bytes 1 and low-3-bits of byte 2 (LSB-first packing per S.BUS spec). - Tests/IP/Bus/SBUSHWTest.lean: behavioural check — packs 16 channels via the pure-data `IP.Bus.SBUS.packChannels`, wraps with header/flags/footer, feeds 25 bytes one per cycle, and asserts `headerOk` latched, `footerOk` pulsed on cycle 25, and ch0 matches `unpackChannels[0]`. Two #synthesizeVerilog checks (idx scalar + ch0 scalar). - Tests/Drivers/SBUSHWTestMain.lean, lakefile.lean: wire the `sbus-hw-test` lean_exe. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
- IP/Bus/CRSFHW.lean: byte-serial CRC-8 (poly 0xD5) accumulator (`crc8HW`) — the CRSF frame-check function. Consumes one byte per cycle with the 8 bit-serial shift steps unrolled combinationally. Same construction shape as CANHW.crc15HW. - Tests/IP/Bus/CRSFHWTest.lean: behavioural check — folds a known 11-byte Link Statistics payload (type + 10 data bytes) through the HW LFSR and matches the register at cycle N+1 against `IP.Bus.CRSF.crc8` on the same array. #synthesizeVerilog on the 8-bit CRC output. - Tests/Drivers/CRSFHWTestMain.lean, lakefile.lean: wire the `crsf-hw-test` lean_exe. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
- IP/Bus/MIL1553HW.lean: two HW pieces for the MIL-STD-1553B
physical layer:
* `oddParityHW` — combinational 16-bit odd-parity generator
(XOR-reduce over the 16 content bits + invert), matches
`IP.Bus.MIL1553.oddParity` cycle-by-cycle.
* `manchesterEncoderHW` — bi-phase-L (1553B convention) with
a 1-bit phase register that toggles per enable tick. Line
level is (phase==0 ? !bitIn : bitIn), so 0→HL and 1→LH.
- Tests/IP/Bus/MIL1553HWTest.lean: behavioural check — sweeps 16
content values through oddParity and checks vs pure-data ref,
and prints an 8-cycle Manchester encoder trace on a 3-bit
pattern. Two #synthesizeVerilog checks (parity + line).
- Tests/Drivers/MIL1553HWTestMain.lean, lakefile.lean: wire the
`mil1553-hw-test` lean_exe.
Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
- IP/Bus/CANopenHW.lean: two HW pieces layered on IP.Bus.CAN /
IP.Bus.CANHW:
* `cobIdDemuxHW` — purely combinational: splits an 11-bit
COB-ID into (function code, node ID) + one-hot flags for
the CiA 301 predefined connection set (NMT / SYNC /
Heartbeat / SDO Tx/Rx). Uses `BitVec.extractLsb'` for
slicing.
* `nmtStateFsmHW` — 2-bit NMT state register with the
standard state transitions (0=pre-op, 1=oper, 2=stopped,
3=boot-up). Accepts a validated command byte and an
external reset pulse; priority-mux dispatch on the byte.
- Tests/IP/Bus/CANopenHWTest.lean: behavioural check — sweeps 6
representative COB-IDs (NMT / SYNC / Heartbeat / SDO Tx / SDO
Rx / TPDO1) and compares (fc, nid) + one-hot flags against
`IP.Bus.CANopen.decodeCobId`. Then feeds a 4-cycle NMT
command sequence and asserts the state trajectory. Three
#synthesizeVerilog checks (fc, isNmt, state).
- Tests/Drivers/CANopenHWTestMain.lean, lakefile.lean: wire the
`canopen-hw-test` lean_exe.
Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
- IP/Bus/DroneCANHW.lean: three HW pieces layered on
IP.Bus.CAN / IP.Bus.CANHW:
* `crc16CcittHW` — byte-serial CRC-16-CCITT-FALSE (poly
0x1021, init 0xFFFF). Widens the input byte to 16 bits
via concat, XORs with the CRC state after shifting into
the upper octet, then runs 8 combinationally-unrolled
MSB-first LFSR steps.
* `nodeFilterHW` — purely combinational 7-bit src-node vs
self-node comparator.
* `transferIdTrackerHW` — mid-transfer sanity tracker: 5-bit
transfer-ID + 1-bit toggle registers with mismatch
detection, clear on SOT. Uses (fun b => !b) <$> (a == b)
idiom for Bool inequality (avoids the Bool `!=`/`not`
synth pitfall documented in the shared-CLAUDE.md notes).
- Tests/IP/Bus/DroneCANHWTest.lean: behavioural check — folds a
7-byte NodeStatus payload through crc16CcittHW and compares
against `IP.Bus.DroneCAN.crc16Ccitt`; then walks the tracker
through a well-formed single-frame + 2-frame stimulus and
asserts error stays low. Three #synthesizeVerilog checks
(crc, accept, error).
- Tests/Drivers/DroneCANHWTestMain.lean, lakefile.lean: wire the
`dronecan-hw-test` lean_exe.
Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Add a new matrix group `bus-hw` for the 8 lean_exes wired up in the preceding commits (lin-hw-test / i2c-hw-test / spi-hw-test / sbus-hw-test / crsf-hw-test / mil1553-hw-test / canopen-hw-test / dronecan-hw-test). Rationale for a separate matrix entry rather than folding into the existing `bus` group: `bus` already carries 8 exes (pcie- test, pcie-hft-test, can-*, canopen-test, dronecan-test, serial-bus-test, avionics-bus-test) and adding 8 more would push it to 16, past the ≤10 heuristic called out in ip-tests.yml's header comment. Splitting keeps failure attribution tight and the group cache-friendly. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Add the 8 new `Sparkle.Tests.IP.Bus.<Name>HWTest.main` invocations to the master `lake test` runner so a runtime divergence in the LFSR / FSM / accumulator paths aborts the release gate (each `.main` calls `IO.Process.exit 1` on mismatch, same as the PCIe / Ethernet / Crypto sim mains above). Importing the modules also pulls their `#synthesizeVerilog` checks into the `lake build test` step, so a synth-side regression on any of the 8 breaks `lake test` at build time too. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
docs(readme): hosted site links + Verilator/xeus-lean/iverilog/Yosys ack
Bus IPs: add HW modules + synth checks for the 8 pure-data protocols
Byte-serial RLP header emitter (`rlpHeaderHW`) covering the three prefix classes: 1-byte (≤55), 2-byte (56..255), and 3-byte (256..2047) headers, for both byte-string and list frames. Behavioural check compares cycle-by-cycle output against `IP.Crypto.RLP.encodeLength` + a payload sweep; also verifies the classic ["cat","dog"] Ethereum-wiki KAT through the pure-data encoder. The HW handles only the length-prefix decode — payload byte concatenation is a plain 8-bit pass and doesn't need its own module. Wired into `lake test` and the release-gate `Tests.AllTests` runner via a driver at `Tests/Drivers/RLPHWTestMain.lean` and a `lean_exe «rlp-hw-test»` stanza. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Streaming Merkle-tree root accumulator (`merkleRootHW`) with 4 slots × 256 bits (supports trees up to 2^4 = 16 leaves). Uses a carry-propagating "adder-style" FSM: on each `push`, the leaf digest walks up occupied slot levels, combining with the resident partial digest via an external SHA-256 hasher (`combineLeft`/`combineRight`/`combineOut`/`combineDone` handshake). Delegating the hash keeps the module compact and lets the same slot machinery scale to deeper trees without also hauling the SHA-256 engine's state. Behavioural: streams `hashLeaf 42..45` through the FSM, feeds the three combine responses on a pre-computed schedule, and verifies slot 2 holds the 4-leaf `Merkle.commit` root (hash-tree derived from SHA-256 over 8-byte LE leaf encodings). Wired into `lake test` and the release-gate `Tests.AllTests` runner via `Tests/Drivers/MerkleHWTestMain.lean` and the `lean_exe «merkle-hw-test»` stanza. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
HKDF-Expand round-counter FSM (`hkdfExpandHW`). Owns the HKDF-specific piece — a 4-state machine (idle → trigger → wait → done) that walks the counter i = 1..N, pulses an external HMAC-SHA-256 handshake once per round, latches the returned T(i) block, and asserts `done` when all rounds complete. HMAC-SHA-256 itself is delegated (same pattern as Merkle's SHA-256 combiner) — this HW closes the round-counter piece that isn't inside the HMAC engine. Behavioural: verifies pure-data `hkdfExtract` + `hkdfExpand` match RFC 5869 Test Case 1 (PRK + 42-byte OKM), then drives the FSM through a 2-round HKDF-Expand (L = 42 bytes) with pre-computed T(1)/T(2) from `hmacSha256` and checks the FSM raises `done` with `tPrev = T(2)` at the expected cycle. Wired into `lake test` and the release-gate `Tests.AllTests` runner via `Tests/Drivers/HKDFHWTestMain.lean` and the `lean_exe «hkdf-hw-test»` stanza. Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Signal-side helpers mirroring SHA-256's L.1.b HW piece: 64-bit
`rotr64Sig`, `shr64Sig`, `bigSigma0Sig64`/`bigSigma1Sig64`,
`smallSigma0Sig64`/`smallSigma1Sig64`, `chFn64Sig`,
`majFn64Sig`, plus an 80-entry `kMux` selecting K[t] from a
7-bit round counter (via `kLut!` — same shape as SHA-256's).
Behavioural: verifies pure-data SHA-512("abc") against
FIPS 180-4 App C.1 ("ddaf35a1…54ca49f"), then cycles each
Signal helper against its pure-data twin on a non-trivial
input word, plus 4 K-mux table lookups (t = 0, 16, 63, 79).
Synth: `#synthesizeVerilog synth_sha512KMux` produces the
80-way 64-bit constant table. Stand-alone synth of the
combinational helpers is punted — cross-module inline of
`@[reducible, inline]` defs isn't on the elaborator's fast
path yet (same as SHA-256's own helpers today; only kMux is
synth-checked there).
The 80-round iterative compressor (`sha512Block` analogue
to `sha256Block`) is not shipped in this commit — the same
Lean-sim exponential-recursion cost that gates SHA-256's C2
follow-up applies to the 1024-bit-wide W buffer here.
Wired into `lake test` via `Tests/Drivers/SHA512HWTestMain.lean`
and the `lean_exe «sha512-hw-test»` stanza.
Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Round-per-clock iterative AES-128 encryptor (`aes128BlockHW`).
128-bit state register + on-the-fly key expansion. 12-cycle
pipeline: start pulse (t=0) latches key/block, initial XOR at
t=1, 9 mid rounds (t=2..10) with MixColumns, final round at
t=10, ciphertext valid + done pulse at t=11.
Primitive combinational sub-modules (all `@[reducible, inline]`
in-file docstrings; callers can reuse them for their own AES
compositions):
* `sboxHW` — 256-way `kLut!` byte substitution
(also stand-alone `#synthesizeVerilog`d).
* `rconHW` — 16-way `kLut!` round-constant lookup
(indexed by round number 1..10;
also stand-alone `#synthesizeVerilog`d).
* `subBytesHW` — 16 `sboxHW` instantiations + repack.
* `shiftRowsHW` — pure byte-position wiring permutation.
* `mixColumnsHW` — 4 GF(2^8) column mixes via `xtimeHW`.
* `addRoundKeyHW` — 128-bit state XOR.
* `keyExpansionHW` — one-round key derivation
(SubWord + RotWord + Rcon).
The top-level FSM inlines the primitive bodies directly into
its `circuit do` block (rather than calling them by name) so
the elaborator can unfold everything inside one module —
cross-module inline of the plain combinational defs isn't on
the fast path today. The named defs remain as reusable
combinators + a documentation surface.
Behavioural: FIPS 197 App B KAT
key = 2b7e151628aed2a6abf7158809cf4f3c
plaintext = 3243f6a8885a308d313198a2e0370734
expected = 3925841d02dc09fbdc118597196a0b32
The HW produces the expected ciphertext at t=11 (~54 s Lean
sim wall-time on the 128-bit state, ok for the sub-200-cycle
horizon in CLAUDE.md).
Synth: `#synthesizeVerilog` on `sboxHW` and `rconHW` sub-modules
in-file. Top-level FSM synth (`aes128BlockHW.ciphertext`/`done`
projections) trips the multi-output sub-module projection path
that PR #66 still tracks as a known-fragile spot; deferred to
that follow-up.
PUNTED (direction #2, follow-up commit):
* AES decryption (`invSubBytes`, `invMixColumns`, reverse-order
round keys). The pure-data reference already ships in
`IP/Crypto/AES.lean`; only the HW mirror is deferred.
Wired into `lake test` via `Tests/Drivers/AESHWTestMain.lean`
and the `lean_exe «aes-hw-test»` stanza.
Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Two GCM-specific HW pieces that sit on top of the existing
AES-128 and GHASH engines:
* `gcmCounterHW` — 128-bit CTR block with NIST inc32
semantics (add 1 mod 2^32 to the low 32 bits, preserve
the high 96 bits). Latches J_0 on start, increments
on each step pulse.
* `gcmTagAccumulatorHW` — Y_i ← gmul(Y_{i-1} XOR C_i, H)
FSM. Same IDLE/BUSY handshake as
`IP.Crypto.GHASHHW.ghashFullHW` but exposes `mulX` and
`fire` on the boundary so the caller wires up the
multi-cycle multiplier explicitly.
Together with `AESHW.aes128BlockHW` and `GHASHHW.gmulHW` these
close the AEAD_AES_128_GCM datapath — every stage of the tag
+ ciphertext pipeline now has a `circuit do` module.
Behavioural: cross-checks pure-data `encryptAead` against NIST
SP 800-38D Appendix B Test Case 2 (16-byte plaintext of zeros,
128-bit key of zeros, empty AAD — expected C =
0388dace…b2fe78, T = ab6e47d4…57bddf). Then verifies the
counter's inc32 behaviour over 3 step pulses, and the tag
accumulator's handshake on a single block feed (fire pulses,
mulX = Y XOR block, Y latches on mulDone).
Synth: `#synthesizeVerilog` on `gcmCounter.counter`,
`gcmTagAccumulator.y`, and `.fire` outputs.
Wired into `lake test` via `Tests/Drivers/AESGCMHWTestMain.lean`
and the `lean_exe «aes-gcm-hw-test»` stanza.
Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Full 24-round Keccak-f[1600] iterative permutation
(`keccakF1600HW`) implemented as a `circuit do` FSM with 25
separate `BitVec 64` lane registers (indexed `x + 5*y`). Each
cycle computes one combinational θ → ρ → π → χ → ι round via
`keccakRoundHW`. Pipeline:
cycle 0 : `start` pulses, latch input state.
cycle 1..24: one round per clock (cnt = 1..24).
cycle 25 : done pulse, hold state.
The ι-step Rcon LUT is a `@[hardware_module] keccakRcHW`
(32-way `kLut!` — 24 constants + 8-slot pad to a power of 2).
It's the only stand-alone sub-module the FSM emits; the θ / ρ
/ π / χ steps are all combinational and inline directly.
Behavioural: pure-data check of the Ethereum-style Keccak-256
of the empty string (expected
`c5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470`),
plus a smoke test that the HW module instantiates cleanly on a
constant input.
Sparkle-side limitation:
* The pure-Lean `Signal.val` simulator can't sample the
full 24-round permutation output — 25 interlocking lane
signals × BitVec 64 causes exponential recursion, so even
a single-round sample at t=2 times out (worse than
SHA-256's `sha256Block` sim which hits the same L.1.b/C2
class of problem). Cycle-by-cycle validation against
`keccakF` is punted to a Verilog-backend co-sim; the
algorithmic correctness is covered by the pure-data
Ethereum KAT.
Synth: `#synthesizeVerilog synth_keccakRc` generates the
Rcon LUT sub-module. The top-level `keccakF1600HW` builds
clean (`lake build IP.Crypto.Keccak256HW` green), so downstream
Verilog synthesis can drive the full permutation.
Wired into `lake test` via `Tests/Drivers/Keccak256HWTestMain.lean`
and the `lean_exe «keccak256-hw-test»` stanza.
Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Add a `crypto-hw` matrix group that runs the wave-1 crypto HW test exes. Kept separate from the existing `crypto` group so the AES HW's ~54 s Lean-sim latency (128-bit state Signal.val recursion) doesn't sit behind the ~13 pure-data crypto exes and blow the per-group timeout. Group contents: rlp-hw-test merkle-hw-test hkdf-hw-test sha512-hw-test aes-hw-test aes-gcm-hw-test keccak256-hw-test Claude-Session: https://claude.ai/code/session_01VMjXCAdmfLJsSmRb3B5CLH
Crypto IPs (wave 1): HW modules + synth checks for the byte/word FSM tier
Prove that Divider divides
…oint) `Signal.loop` was `opaque` (a wrapper around the `unsafe` memoizing `loopImpl`), so `(Signal.loop f).val t` could not be unfolded by the kernel. The fixpoint equation `loop f = f (loop f)` — the load- bearing fact for every feedback-circuit proof (e.g. PR #65's RV32 divider) — was therefore a *trusted axiom* (`loop_unfold`). This replaces the opaque wrapper with a *pure* definition and turns `loop_unfold` into a proved theorem, removing the custom axiom from the trust base. ## What changed `Sparkle/Core/Signal.lean`: - `loopGo f t` — the fixpoint value at time `t`, by strong recursion: apply `f` to a signal that recursively supplies the already-computed values for every earlier cycle `i < t` and `default` at `≥ t`. The `if i < t` guard makes it well-founded (`termination_by t`). Total, axiom-free. - `loop f := ⟨loopGo f⟩` — now a `def`, not `opaque`. Still carries `@[implemented_by loopImpl]`, so *execution* uses the O(n) memoizing implementation; only the *logical* value is the pure fixpoint. - `loopGo_eq` — the defining equation, exposed for proofs. `Sparkle/Verification/LoopProps.lean`: - `loop_unfold` goes from `axiom` to `theorem`, proved by rewriting through `loopGo_eq` and discharging the `default` placeholder with `StrictlyCausal` (a strictly causal body never observes cycles `≥ t`, which is exactly why the placeholder is invisible). - Module docstring updated to describe the pure definition. ## Trust base `#print axioms dividerSignal_result_gen` (a downstream RV32 divider theorem) no longer lists `loop_unfold`. Remaining axioms are all Lean-standard: `propext`, `Classical.choice`, `Quot.sound`, `Lean.ofReduceBool` (from `bv_decide`/`native_decide`, pre-existing), and `Lean.trustCompiler` (from `@[implemented_by]` — the standard trust that `loopImpl` computes the same value as `loopGo`). The bespoke fixpoint axiom is gone. ## Performance `@[implemented_by loopImpl]` keeps sim fast: a direct `Signal.loop`-counter samples `.val 2000` in ~0 ms (memoized O(n), confirmed by probe). Matched `lake exe test` runs: main 7m52s vs this branch 8m02s — within noise, no regression. ## Also fixed (surfaced by the first truly-clean build) `lakefile.lean`'s `sparkleModuleDeps` list was stale: - `sparkle_Sparkle_Backend_CppSim.so` → `_CSim.so` (CppSim was removed / renamed to CSim in the C-only-backend work; the `.so` no longer exists, so a clean build's monolithic link failed with "-l:...CppSim.so not found"). - Added the missing `sparkle_Sparkle_Core_SignalLeavesDerive.so` (the `SignalLeaves` deriving handler, added with the per-leaf output-port split, was never registered → clean-build load error "undefined symbol: initialize_..._SignalLeavesDerive"). Both were latent bugs on `main` that only bite a from-scratch `.lake/build` wipe; incremental builds kept the old `.so` around. ## Verification - `lake build` clean. - `lake exe test`: 13/13 lspec pass. - Both `Sparkle.Verification.Divider.{Correct,States33}` build against the new pure `loop` with no proof changes needed — PR #65's divider proof is now the regression test that `loop = loopGo` is the right fixpoint.
…m-smoke) PR #65 (`c59820c`) added `moreLinkArgs` to the `Sparkle` lean_lib to force the `sparkle_barrier` / `sparkle_jit` extern archives whole-into the precompiled `libsparkle_Sparkle.so` — necessary so the `.so` self-resolves `sparkle_cache_get` / `sparkle_eval_at` (the `@[extern]` LICM barriers `Signal.loop`'s memoization uses; without it the interpreter/LSP fails to dlopen the `.so`). But it used a *relative* path `./.lake/build/c_src`. `moreLinkArgs` propagates to every DOWNSTREAM consumer of the `sparkle` package, and there the relative path resolves against the *downstream's* build dir — which has no `c_src` — so the link fails: ld.lld: error: unable to find library -l:libsparkle_barrier.a ld.lld: error: unable to find library -l:libsparkle_jit.a This is why `downstream-smoke` (ubuntu / macos / wsl) went red on `main` starting at the PR #65 merge (`9c0e776`); it was green on the prior commit (`df8560e`). Fix: compute an ABSOLUTE path from the lakefile's own location (`__dir__`), captured as `sparkleCSrcDir`. Whether Sparkle is the root package (`lake build` here) or a git dependency (`<downstream>/.lake/packages/sparkle/…`), the path points at Sparkle's *own* populated `c_src`, so an inheriting downstream link resolves the archives correctly. Also simplifies the Linux arm to pass the absolute `.a` paths directly inside `--whole-archive` (no `-L dir` + `-l:name.so` file search needed). The `Sparkle`-lib force_load remains; only the path form changes. The IP-lib `sparkleDynlibLinkArgs` (per-module `.so` deps for the interpreter/LSP force-load path) is unaffected — it's already per-platform guarded and doesn't reach downstream MyBlinky. Verification: - `nm -D libsparkle_Sparkle.so` shows `sparkle_cache_get` / `sparkle_eval_at` as `T` (defined) — the `.so` is still self-contained. - `lake build` clean; `lake exe test` 13/13. - The emitted arg is now e.g. `-Wl,--whole-archive /abs/.lake/build/c_src/libsparkle_barrier.a …`.
LoopProps: make loop_unfold a theorem, not an axiom (pure loopGo fixpoint)
goldilocksMulHW — multiply mod the Goldilocks prime 2^64-2^32+1 (the STARK field), as a multi-cycle circuit do FSM. Behavioural test checks cycle-by-cycle against the pure-data Goldilocks.mul; SynthesisChecks section emits #synthesizeVerilog. 64-bit field, the cheapest of the four field-mul HW modules in this wave.
secp256k1MulHW — multiply mod the secp256k1 field prime 2^256-2^32-977 (Bitcoin/Ethereum curve), multi-cycle circuit do. Behavioural check vs pure-data Secp256k1Field.mul + #synthesizeVerilog.
p256MulHW — multiply mod the NIST P-256 field prime, multi-cycle circuit do. Behavioural check vs pure-data P256Field.mul + #synthesizeVerilog.
ed25519FieldMulHW — multiply mod 2^255-19, the HW follow-up the
Ed25519Field.lean docstring promised ("the HW engine ... follows
in L.2.b"). Multi-cycle circuit do; behavioural check vs the
pure-data Ed25519Field.mul + #synthesizeVerilog.
BLS signatures on the BLS12-381 pairing-friendly curve — the curve
Ethereum 2.0 consensus (validator aggregate signatures), zk-rollup
proof verification, and threshold-signature schemes are built on.
Sparkle had no BLS / pairing IP before this.
Pure-data (Nat/BitVec, no Signal) reference, layered like the
existing Ed25519 tower:
Fp — prime field mod the 381-bit BLS12-381 base prime
Fp2 — Fp[u]/(u²+1)
Fp6 — Fp2[v]/(v³-ξ), ξ = u+1
Fp12 — Fp6[w]/(w²-v) (the pairing target group GT)
G1 — E(Fp): y² = x³ + 4
G2 — E'(Fp2): y² = x³ + 4(u+1) (the twist)
Pairing — optimal ate: Miller loop + final exponentiation,
transcribed line-for-line from Ethereum's py_ecc
reference (loop parameter, linefunc, final exp).
sign / verify / aggregate / verifyAggregate
The test (`bls12381-test`) checks: Fp12 pairing bilinearity +
non-degeneracy, sign→verify round-trip PASSES, tampered signature
FAILS, and a 3-signature aggregate over one message verifies.
Scope notes (documented in-file):
- **hash-to-curve is a SIMPLE deterministic placeholder**, not the
full RFC 9380 `hash_to_curve` suite — enough for the
sign/verify/aggregate self-consistency the test exercises, but
NOT interoperable with real Ethereum signatures. Marked clearly
so nobody mistakes it for a production H2C.
- **No HW module for the pairing** — 381-bit modmul + a Miller loop
on an FPGA is a research project, explicitly out of scope. BLS's
value here is the machine-checkable reference + IP-catalog
presence; the tractable HW unit (field multiply) ships separately
in this wave for the smaller curves.
lakefile.lean: 5 new lean_exe stanzas (bls12381-test,
{goldilocks,secp256k1,p256,ed25519}-mul-hw-test), each
supportInterpreter := true, pointed at Tests/Drivers/*Main.lean.
Tests/AllTests.lean: 5 new imports + .main calls in the release-gate
runner. ip-tests.yml crypto/crypto-hw groups already cover the
Crypto dir; the new exes ride along there.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wave 2 of the Crypto HW/reference build-out (wave 1 = AES/Keccak/SHA-512/HKDF/RLP/Merkle HW, PR #80). Adds the BLS12-381 signature scheme (pure-data reference — the load-bearing addition) plus field-multiply HW for four curves.
Deliverable A — BLS12-381 (primary)
IP/Crypto/BLS12_381.lean(~640 LOC). BLS signatures on theBLS12-381 pairing-friendly curve — the curve Ethereum 2.0 consensus
(validator aggregate signatures), zk-rollup proof verification, and
threshold-signature schemes are built on. Sparkle had no BLS /
pairing IP before this.
Full tower, layered like the existing Ed25519 tower:
optimized_bls12_381(pseudo-binary loop encoding, D-type untwistψ(x,y)=(x·w⁻², y·w⁻³), homogeneous-projective
linefuncwithnum/den accumulation, full
(p¹²−1)/rfinal exp). Verifiedbilinear + non-degenerate — the property that makes
sign/verify work.
expand_message_xmd(SHA-256) feeding a documented SIMPLEplaceholder
hashToG2(scalar·generator), NOT the SSWU suite —so it is intentionally not blst/eth2-KAT interoperable. Clearly
marked so nobody mistakes it for production H2C.
bls12381-test): Fp12 pairing bilinearity +non-degeneracy, sign→verify PASSES, tampered-sig /
wrong-message / wrong-keyset REJECTED, 3-sig aggregate
verifies. Self-consistency (documented), not third-party KAT.
loop on an FPGA is a research project. BLS's value here is the
machine-checkable reference + IP-catalog presence.
Deliverable B — field-multiply HW (4 curves)
Bit-serial double-and-add modular multiply (MSB-first, per-bit
conditional-subtract reduction), each a
circuit doFSM + simcross-check vs the pure-data
mul+#synthesizeVerilog:GoldilocksHW.leangoldilocks-mul-hw-test✅Secp256k1FieldHW.leansecp256k1-mul-hw-test✅P256FieldHW.leanp256-mul-hw-test✅Ed25519FieldHW.leaned25519-mul-hw-test✅Ed25519's HW is the follow-up its own docstring promised ("the HW
engine … follows in L.2.b"). All four share one verified generic
shift-add mod-p skeleton (not Solinas/field-specific fast reduction —
that's a follow-up); each matches
(a·b) mod pexactly.Punted (as scoped): RSA-PSS HW (2048-bit modexp), point-op /
scalar-mul HW, pairing HW.
Verification (local)
lake buildclean — all#synthesizeVerilogcompile.bls12381-test,{goldilocks,secp256k1,p256,ed25519}-mul-hw-test.lake exe test: 13/13 lspec.#synthesizeVeriloginvocations (3 per field HW).New Sparkle-side limitation found
A
@[reducible, inline]helper wrapping aSignal.muxover an<*>-built comparison fails synth ("Cannot instantiate Seq.seq")though it sim-passes — the conditional-subtract must be inlined
directly into the
circuit dobody. Also: useBitVec.ule(not<=) andBitVec.append (0#k) v(not.setWidth) forzero-extend. (Extends the gotcha inventory from PR #79/#80.)
CI wiring
lakefile.lean— 5 newlean_exestanzas.Tests/AllTests.lean—5 imports +
.maincalls (release-gate runner). Synth runs atlake build; behavioural mains underlake test.