Skip to content

Docs/sparkle style tutorial#7

Merged
junjihashimoto merged 23 commits into
mainfrom
docs/sparkle-style-tutorial
Jun 9, 2026
Merged

Docs/sparkle style tutorial#7
junjihashimoto merged 23 commits into
mainfrom
docs/sparkle-style-tutorial

Conversation

@junjihashimoto

Copy link
Copy Markdown
Contributor

No description provided.

README: 866 → 178 lines, sparkle-style structure.
- Drop Features 8-subsection wall, 4-scenario test logs, full Contributing
  guide (moved to CONTRIBUTING.md), 30+ project-structure tree lines.
- Add IP Catalog table (BitNet / Gemma 4 / LoRA / Verified AD / Tetris).
- Add tutorial chapter index (12 chapters) pointing at docs/tutorial/md/.
- Add Quick Start with Docker tutorial image (ghcr.io/.../hesper-tutorial)
  as the primary onramp, plus local-build fallback.
- Tighten How-It-Works diagram to show ShaderM IR with WGSL + CUDA PTX
  backends.

Move 33 internal R&D notes + 8 experimental subdirs into docs/research/
via git mv (history preserved). Top-level docs/ now keeps only the four
user-facing files: CHANGELOG.md, LORA_FINETUNING.md, VERIFIED_AD.md,
circuit-dsl-tutorial.md.

Add CONTRIBUTING.md (extracted + tightened from old README §Contributing).
Add docs/research/README.md pointing readers back at user-facing docs.
Mirror sparkle's docs/tutorial/md/Ch##_TopicName.md convention so a
single Markdown source produces both .lean and .ipynb via xlean-convert
(see docs/tutorial/build-from-md.sh).

Chapters:
- Ch00 Setup
- Ch01 Lean 4 for ML Engineers
- Ch01b Your First Hesper Project
- Ch02 The Shader DSL (WGSL + ShaderM)
- Ch03 Automatic Differentiation & Verified Ops
- Ch04 High-Level API & Tensors
- Ch05 Switching Backends (WebGPU / CUDA)
- Ch06 Proofs — Equivalence & Invariants
- Ch07 BitNet End-to-End
- Ch08 Gemma 4 End-to-End
- Ch09 Embedding Hesper in Other Projects
- Ch10 Hesper Architecture

.gitignore: ignore docs/tutorial/Notebooks/Gen/ (regenerated from .md
files at Docker-image build time), and fix the now-stale
docs/llama-fusion-analysis/ pattern to docs/research/llama-fusion-analysis/.
Stage 3 — docker/tutorial/Dockerfile (5 stages):
  1. ubuntu:24.04 base + X11/Wayland/Vulkan dev headers (matches
     CI's linux-vulkan apt list, sans the FPGA tools from sparkle).
  2. elan + Lean v4.26.0.
  3. Jupyter Lab + jupytext + nbconvert in /opt/venv.
  4. xeus-lean from source (clang/libc++ build, patches XEUS_KERNEL
     _PROTOCOL_VERSION_MINOR 3 → 4); installs xlean + xlean-convert;
     duplicates the kernelspec under the `xeus-lean` name expected by
     notebook frontmatter.
  5. Copy Hesper sources, `lake update`, `lake build Hesper` to warm
     the cache, and `bash docs/tutorial/build-from-md.sh` to produce
     .ipynb chapters. Patches the kernelspec's LEAN_PATH so `import
     Hesper` works inside notebooks.

ENTRYPOINT: jupyter lab on :8888, rooted at docs/tutorial/Notebooks/
Gen/notebooks/.

Stage 3 — .github/workflows/docker-image.yml:
  - Triggers on `v*` tag push or workflow_dispatch (tag arg).
  - Matrix build: linux/amd64 (ubuntu-latest) + linux/arm64
    (ubuntu-24.04-arm), each pushing by digest with per-arch GHA cache.
  - Merge job: combines digests into a multi-arch manifest published
    under both the release tag and `:latest`.
  - Inspects the resulting manifest at the end for visibility.

Stage 4 — docs/tutorial/README.md (tutorial index):
  - Two onramps: `docker run ghcr.io/.../hesper-tutorial:latest`
    and a `bash docs/tutorial/build-from-md.sh` local path.
  - Full chapter table with file links.
  - Authoring rules to keep the .md→.lean→.ipynb pipeline consistent.

README's `Documentation` section already lists all 12 chapters from
Stage 1, so no further README change is needed here.
Tested all 12 chapters by running xlean-convert against each .md and
then `lake env lean` on the resulting .lean. Previously only Ch08 /
Ch10 elaborated — every chapter with a non-trivial code block had at
least one error.

Fixes applied to the .md sources:

- Ch00: replaced top-level `let` (illegal at module top level) with
  `def`; switched to .toWGSL on the Exp value.
- Ch01: `Carrier` made `abbrev` (was `def`) so `OfNat Float 1` is
  found on the .f32 branch; tidied `#check @square` form;
  consolidated all `import` snippets into a single text block.
- Ch01b: lakefile-DSL snippet moved to ```text (it's not regular
  Lean and elaboration fails on `package`/`lean_lib`/`lean_exe`).
- Ch02: rewrote the ShaderM example to use real API
  (`declareInputBuffer`, `globalId`, `Exp.vecZ`, `readBuffer/writeBuffer`
  with explicit `ty`/`n`), and used `generateWGSLSimple` (the actual
  WGSL entry point) instead of a fictitious `toWGSL`.
- Ch03: removed fictitious `AD.forwardWithBackward`/`AD.gradientStep`
  code; instead show the real `Hesper.Core.Differentiable` typeclass
  with a tiny `ScaleByTwo Float Float` instance and #eval that
  type-checks.
- Ch04: dropped invented Tensor/Linear/Compute API; introduce real
  `TensorDesc.matrix` from `Hesper.Tensor.Types`. Other model-level
  pseudocode moved to ```text.
- Ch05: removed fictitious method names from the GPUBackend example;
  show the real `class GPUBackend` and `#check @GPUBackend.allocBuffer`.
- Ch06: consolidated `import Hesper.Core.Differentiable` to the first
  cell (xlean-convert turns every ```lean block into a separate cell
  but they share one .lean file, so all imports must come first);
  dropped `...` placeholders for `sorry` so the proof body parses.
- Ch07/Ch08/Ch09: model-level / lakefile-DSL pseudocode moved to
  ```text — none of `BitNet.load`, `BitNet.forward`, the lakefile
  `package`/`require` DSL etc. are valid Lean modules.

Added .github/workflows/tutorial-check.yml: runs xlean-convert + `lake
env lean` over every chapter on PRs that touch docs/tutorial/, with a
GHA cache for the xeus-lean / xlean-convert build. Will catch any
future regression where a chapter's lean cells stop elaborating.

Verified locally: all 12 chapters now pass `lake env lean`.
Replace the API-stub-only matmul section with three worked examples
that elaborate live in the notebook:

1. `fusedAddReLU` — pointwise `out[i] = max(a[i] + b[i], 0.0)` over
   an array<f32>: declare 3 buffers, project globalId.x, bounds-check
   with `Exp.lt`, read/add/max/write.  #eval shows the generated WGSL
   without needing a GPU.

2. `sumReduce` — workgroup-coop reduce sketch using `sharedNamed` +
   `barrier` + `readWorkgroup` / `writeWorkgroup`.  Demonstrates the
   structural shape every reduce kernel uses; production RMSNorm is
   in Hesper/Layers/RMSNorm.lean.

3. A text-only driver sketch showing how `Examples/Compute/MainSimple.lean`
   wires the kernel to actual buffers via `Hesper.init` /
   `createBuffer` / `writeBuffer` / `executeShaderNamed` /
   `mapBufferRead`.

Uses only confirmed-real API names (`globalId`, `Exp.vecZ`, `Exp.lt`,
`Exp.litU32`, `Exp.max`, `declareInputBuffer`, `readBuffer`,
`writeBuffer`, `sharedNamed`, `readWorkgroup`, `writeWorkgroup`,
`barrier`, `generateWGSLSimple`).  All elaborate under
`lake env lean` on the xlean-convert output.

Verified: 12/12 chapters still pass the tutorial-check workflow.
Reframe the parallelForDSL section as four steps that show actual
values, not just DSL plumbing:

1. Build input: `(Array.range 10).map (·.toFloat)` — printed via #eval.
2. Define the kernel as a pure `Exp .f32 → Exp .f32` function; inspect
   the generated WGSL via `Hesper.Compute.generateUnaryShader`.
3. Compute the expected result on the CPU with `input.map (· * 1000)`
   — same semantics, printed alongside.
4. The GPU-side `parallelForDSL device kernel input` driver
   (Examples/Compute/ParallelDemo.lean) shown as a text block.

Why steps 3-4 are split: parallelForDSL goes through @[extern] WebGPU
FFI which the Lean interpreter (and hence xeus-lean notebooks) can't
load — `lake env lean file.lean` reports "Could not find native
implementation".  The fix isn't a tutorial concern; it needs hesper
to ship Hesper as a shared library, not just static.  Until then the
notebook shows the *values* via the CPU reference and points users at
`lake exe parallel-demo` for the actual GPU run.

Add `import Hesper.Compute` to Ch04's top cell so
`Hesper.Compute.generateUnaryShader` resolves.

All 12 chapters still pass tutorial-check.
Wires the tutorial Docker image so that notebook cells can run
`parallelForDSL` (and any other `@[extern]` symbol) directly via
`#eval`, instead of telling the user to drop to `lake exe parallel-demo`.

Mirrors Verilean/xeus-lean's `Dockerfile.native-sparkle` pattern:

  1. Build vanilla `xlean` + `xlean-convert` from xeus-lean (Stage 3).
  2. Build Hesper to produce per-module `.c.o.export` objects (Stage 4).
  3. Bundle every Hesper `.c.o.export` (except `Main.c.o.export`s,
     which carry colliding `lean_main` symbols) into a static archive
     `libhesper_olean.a`.
  4. Re-link `xlean` with `XEUS_LEAN_EXTRA_LIBS=-Wl,--whole-archive
     libhesper_olean.a libhesper_native.a Dawn/Highway/CUDA bridge
     ...` so every Hesper-side symbol the interpreter could need is
     statically baked into the kernel binary.
  5. Patch the kernelspec's `LEAN_PATH` so `import Hesper` works
     from notebook cells.

This sidesteps the `precompileModules`/per-module `.so` rabbit hole:
no shared libraries, no `--load-dynlib`, no auto-load order issues —
the kernel exe just *has* the symbols.

lakefile.lean: move `stdLinkArgs` and `cudaExeArgs` ahead of `lean_lib
Hesper`.  The contents are unchanged; only the position moved.  This
lets the Dockerfile (and a future `lean_lib Hesper.moreLinkArgs := ...`
extension) reuse the same link line without duplication.  No effect on
existing `lean_exe` targets that already reference them.

Ch04_HighLevelApi.md step 4: rewrite the GPU-execution cell to call
`runGpu` via #eval (works inside the Docker image), and clarify in the
prose that the cell needs the tutorial container's pre-linked kernel.
Kept the cell as ```text so the CI elaboration check (vanilla
`lake env lean`) doesn't try to resolve `Hesper.init` against a
non-linked interpreter.  All 12 chapters still pass tutorial-check.
xeus-lean's xlean kernel ships pinned to v4.28.0; Hesper was on v4.26.0,
so the Docker tutorial stage 5 (re-link xlean against Hesper) couldn't
load mismatched oleans. Bumping the whole project keeps a single toolchain
across host build, xeus-lean kernel, and notebook eval.

v4.28 API changes touched:
- `String.drop` / `String.take` now return `String.Slice` (was `String`)
- `String.Slice.dropRightWhile` → renamed `dropEndWhile`
- `String.mk` deprecated in favour of `String.ofList`

`lake build Hesper` clean, regression suite 26/26 PASS, all 12 tutorial
chapters still elaborate.
The tutorial chapters only evaluate pure-Lean DSL functions
(`#eval generateWGSLSimple ...` returns a String, `#check ShaderM`
prints a type) — they never call into the WebGPU FFI from a notebook
cell.  The re-link stage therefore added 800+ seconds of build time
and three failure modes (missing `.c.o.export` artefacts on
`lean_lib`-only builds, Ubuntu vs NixOS Dawn install dirs,
`StubExtract` not surviving the v4.28.0 toolchain bump) for no user-
visible benefit.

Switch Stage 4 to the minimal `lake build Hesper` + LEAN_PATH patch.
The vanilla xeus-lean kernel from Stage 3 picks up `Hesper.olean`
through the patched kernelspec and can elaborate every chapter.

Smoke-tested locally: `docker build` succeeds, container starts
jupyter on :8888 with HTTP 200, and `import Hesper.WGSL.DSL` plus
`#eval generateWGSLSimple` returns real WGSL.
Produces a per-module `.so` next to every `.olean`.  Each `.so` is
statically linked against the C bridge / Dawn / Highway / Vulkan
artifacts (`stdLinkArgs`) so it is self-contained for the
`@[extern]` symbols its module declares.

This is the foundation for making the xeus-lean notebook kernel
resolve `@[extern]` calls inside `#eval` cells.  The notebook side
still needs a `LEAN_DYNLIB_PATH` mechanism to actually load these
`.so`s at kernel startup — see
`docs/research/notebook-ffi-investigation.md` for the design and
the deferred follow-up steps.

`lake build parallel-demo` + `lake exe parallel-demo` continue to
work unchanged on the host (NVIDIA Vulkan adapter, real GPU run).
Spot-checked parity tests: gemma4-qproj, gemma4-q4k-mmq,
cuda-bitlinear all `max |err| = 0.000000`.
… on CPU and GPU

End-to-end Ch11: load California Housing CSV → impute → derive
features → normalise → train a linear regression three ways, all
landing on the same Sabela loss trajectory (8.50e9 at iter 100 →
5.39e9 at iter 1000):

  1. CPU closed-form gradient + Hesper.Optimizer.SGD  (full batch).
  2. CPU reverse-mode AD via Hesper.AD.Reverse tape, with the per-row
     squared-error step coming from a *verified* SquaredErrorOp
     Differentiable instance (CPU spec).  Lean's Perceus refcount
     drops each iteration's tape immediately, so the full-batch loop
     stays under ~6 MB of peak heap.
  3. GPU training in lake exe california-housing-gpu using:
     - production Hesper.WGSL.MatMul.executeMatMul for the forward,
     - a new Hesper.Training.MSE module that pairs the CPU
       Differentiable MSEOp spec with two WebGPU kernels
       (executeMSEForward / executeMSEBackward), mirroring the shape
       of Hesper.Training.Loss CrossEntropy,
     - a per-column gradient kernel (sidesteps naiveMatMulKernel's
       out-of-bounds-write footgun on small N),
     - a pointwise SGD update kernel.
     fp32 trajectory matches the CPU AD path to ~0.2 %.

Infrastructure that makes this all run in the xeus-lean Jupyter
container:

* native/CMakeLists.txt + lakefile.lean: build libhesper_native as a
  SHARED library and whole-archive only libdawn_proc.a so the Dawn
  procs static lives in exactly one .so per process.  This fixes the
  per-precompiled-module Dawn-procs duplication that crashed
  Hesper.init / getDevice on lavapipe.  Reverts the libc++ override
  introduced earlier; Dawn keeps its libstdc++ default and the rest
  of Hesper links against it cleanly.
* Hesper/Logging.lean + native/bridge.cpp: HESPER_LOG_LEVEL env var
  (quiet/silent/off/0/false/none) silences both the Lean-side and
  C++-side debug prints.  Ungated [Hesper] HESPER_GPU_FEATURES and
  Device-info logs now respect g_verbose.
* docs/tutorial/md/Ch04_HighLevelApi.md: stop telling users to re-link
  xeus-lean by hand; the Docker image and HESPER_LOG_LEVEL story
  cover those needs.
* docker/tutorial/Dockerfile: make the LEAN_DYNLIB_PATH patch
  idempotent — upstream xeus-lean has since merged equivalent code,
  so `patch -N --batch || true` plus a grep post-condition keeps the
  build green on both old and new bases.
* Hesper/AD/Reverse.lean + ScalarInstances.lean: add
  ADContext.liftBinaryVerified and SquaredErrorOp so the AD path can
  consume any Differentiable Op (Float × Float) Float without the
  caller having to spell out local gradients.
* Hesper/Data/DataFrame.lean: minimal CSV/Value DataFrame with
  impute, derive, randomSplit, normalizeFeatures, toMarkdown,
  toHtml/toHtmlHead (HTML output is what Display.html renders inside
  Jupyter cells; markdown stays for CLI).  Schema/recipe match
  Sabela's CaliforniaHousing example; see in-file Acknowledgements.

docs/research/ch11-gpu-stage-a-wip.md records the matmul
out-of-bounds-write bug we worked around in the GPU example so it
surfaces if anyone else hits it.

Tests/ComputeTests.lean and Hesper/Compute.lean carry small docstring
and example-cell updates lined up with the new tutorial flow.
Replace the hesper-grown docker-image.yml with the cleaner shape
sparkle uses upstream: native amd64 + arm64 runners building by
digest in parallel, then a `merge` job stitches the two digests
into a single multi-arch manifest list tagged both with the
release/dispatch tag and `:latest`.

Image name is the only non-cosmetic diff from sparkle (hesper-
tutorial instead of sparkle-tutorial); the workflow body is
verbatim.

Trigger: still `push: tags v*` for releases, `workflow_dispatch`
for ad-hoc rebuilds.
xlean-convert has no `--help` flag; running with an unknown flag exits
with code 2 after printing usage.  The CI step was relying on
`xlean-convert --help` exiting 0, which it never did — installs were
succeeding but the smoke test layer always failed.

Swallow the exit code, grep for "xlean-convert" in the usage banner
instead; this confirms the binary is on PATH and runnable without
depending on a flag the CLI doesn't recognise.
`precompileModules := true` (introduced for the tutorial Docker image
in e4f8412) builds a per-module `.so` for every Lean file in
`lean_lib «Hesper»`, and every one of those `.so`s links the full
`stdLinkArgs` line — including `-lcuda` on non-macOS hosts.

CI runners (ubuntu-22.04, ubuntu-latest, windows-2022) have the Vulkan
loader and Mesa lavapipe but no NVIDIA driver, so libcuda.so is absent.
Result: `lake build Hesper` finishes the .c.o compilation, then
`ld.lld: error: unable to find library -lcuda` on the first dynlib
link and the whole build fails.  All five CI checks (Linux, macOS,
Windows, Test Summary, Tutorial elaboration) were red because of this
one link error — well-hidden under 4000+ lines of preceding Dawn
compile output.

Setting HESPER_NO_CUDA=1 at the job level routes lakefile through the
existing no-CUDA branch (still links libhesper_cuda.a, the stub
archive, so CUDA FFI extern names resolve to runtime errors that the
Vulkan/Metal paths never reach).  macOS is already handled by
`System.Platform.isOSX` in `stdLinkArgs`, so no env needed there.
…ve.dylib

When task #384 changed hesper_native from STATIC to SHARED, every Lean
consumer (per-module `.so`, lean_exe) started picking the Dawn /
glfw3 symbols up at .dylib load time instead of inside their own
final exe link.  On Linux that was fine, but on macOS the Objective-C
TU's that ship inside libwebgpu_dawn.a (Metal backend) and libglfw3.a
(Cocoa window/monitor code) need:
  - libobjc           (objc_msgSend*, objc_opt_class, objc_retain, …)
  - Foundation        (NSObject, NSString, …)
  - CoreFoundation
  - Metal             (MTLComputeCommandEncoder, MTLBuffer, …)
  - QuartzCore        (CAMetalLayer)
  - IOKit / IOSurface (GLFW monitor code)
  - Cocoa             (NSWindow / NSView)

These used to be resolved at the final lean_exe link via
`stdLinkArgs` in lakefile.lean (which still appends them for each
exe).  But the SHARED .dylib now has to resolve them itself at its
own link time — otherwise we get the macOS CI failure:
  Undefined symbols for architecture arm64:
    "_objc_msgSendSuper2", "_objc_opt_class", "_objc_opt_isKindOfClass",
    "_objc_opt_new", "_objc_release", "_objc_retain", …
    referenced from libwebgpu_dawn.a / libglfw3.a
…n MSVC

The native bridge cmake (`native/CMakeLists.txt`) was tightened in
2026-05-25 to fail-fast when `libwebgpu_dawn.a` / `libglfw3.a` are
missing, because the Linux SIGSEGV fix needs both archives whole-
archived into a single SHARED libhesper_native.so.

On Windows Dawn's MSVC build emits `webgpu_dawn.lib` (and no glfw
static archive at all under the install prefix), so the `.a` probe
hits FATAL_ERROR and `lake build Hesper` fails before the Lean
compile even starts:

  CMake Error at CMakeLists.txt:76 (message):
    libwebgpu_dawn.a not found under .../dawn-install/lib{,64}/

Hesper has no production Windows backend yet (no D3D12 native code,
no Windows lean_exe wired to libhesper_native), so the right answer
is the same as before precompileModules went in: don't build the
native bridge on Windows.  Early-return from the `nativeDeps` target
on `System.Platform.isWindows`; `lake build Hesper` still compiles
the pure-Lean library, which is all the Windows CI job exercises.

Production Windows support (D3D12 / WinSDK link line) would need a
separate `.lib`-aware branch in `native/CMakeLists.txt` plus a Dawn
build of `webgpu_dawn` as a SHARED DLL — out of scope here.
8b2224b made the `nativeDeps` target a no-op on Windows, but
`stdLinkArgs` still appended `./.lake/build/simd/libhesper_simd.a`,
`./.lake/build/simd/_deps/highway-build/libhwy.a`, and
`./.lake/build/native/libhesper_cuda.a` to every per-module dynlib
link.  Without the native bridge those files don't exist, so all 17
`Hesper.*:dynlib` links fail on Windows:

  clang: error: no such file or directory: './.lake/build/simd/libhesper_simd.a'

Make `stdLinkArgs` return `#[]` on Windows (so per-module `.dll`s
link with the Lean defaults only) and the same for `cudaExeArgs`.
The Windows CI only exercises pure-Lean `lake build Hesper`; no
lean_exe is built there.  This matches the eee6458-era behaviour
where `precompileModules` wasn't set, so per-module dynlibs didn't
exist.
The raw `target_link_libraries(... "-framework" "CoreFoundation" ...)`
form was failing on macos-15 with:
  ld: library 'CoreFoundation' not found

because the default linker framework search path doesn't include the
SDK's `System/Library/Frameworks`, and CMake doesn't automatically
forward `-isysroot` to a bare `-framework Foo` token list.  Using
`find_library(... CoreFoundation)` resolves the framework path
against `CMAKE_OSX_SYSROOT` and hands the absolute path to the
linker, sidestepping the search-path problem entirely.

Also reorders Cocoa to come last (after the per-framework deps)
since GLFW depends on it last in the link line.
bridge.cpp / glfw_bridge.cpp reference Lean runtime symbols
(`lean_alloc_object`, `lean_mk_string`, `lean_inc_heartbeat`,
`mi_malloc_small`, `lean_register_external_class`, ...) but the
runtime they belong to lives in the *host* binary (`xeus-lean`,
`lake env`, or any `lean_exe`) that dlopen()'s libhesper_native at
run time.  Linux ld treats those as expected undefined symbols in a
shared lib by default; Apple ld (macos-15 / Xcode 16) refuses to
finalise the dylib unless we say so:

  Undefined symbols for architecture arm64:
    "_lean_alloc_object", "_lean_byte_array_push",
    "_lean_dec_ref_cold", "_lean_inc_heartbeat",
    "_lean_internal_panic_out_of_memory", "_lean_mk_io_user_error",
    "_lean_mk_string", "_lean_register_external_class",
    "_mi_malloc_small", ...

`-undefined dynamic_lookup` is the standard Apple linker flag for
plugin dylibs hosted by an executable that provides the symbols.
After the SHARED-lib refactor, each consumer dylib
(Hesper_Hesper_WebGPU_Types.dylib etc) embeds an LC_LOAD_DYLIB entry
of `@rpath/libhesper_native.dylib` plus an LC_RPATH of
`@loader_path/../native`.  On a fresh macos-15 GitHub Actions runner,
dyld expands that to `.lake/build/lib/lean/../native/libhesper_native.dylib`
and reports "no such file" — even though the file is present at
`.lake/build/native/libhesper_native.dylib`.  The bogus deployment-
target warning ("built for macOS 99.0 which is newer than running OS")
hints the actual reject reason is dylib metadata, but in either case
the @rpath indirection is fragile.

Setting `INSTALL_NAME_DIR` + `BUILD_WITH_INSTALL_NAME_DIR` makes the
dylib's LC_ID_DYLIB an absolute path (`<build_dir>/libhesper_native.dylib`),
which propagates into every consumer's LC_LOAD_DYLIB at link time —
dyld then loads the dylib by absolute path with no @rpath search at
all.  Tradeoff: the build directory is hardcoded into each
consumer, so the build tree can't be relocated.  Fine for CI (same
runner workspace per build).
Three independent failures across the latest Cross-Platform CI +
Tutorial elaboration check runs.  Diagnosed by pulling logs via gh
(thanks for installing it).

1. macOS — `Examples/BitNetComplete.lean:178,191` Lean v4.28 type
   mismatch.  `String.drop` and `String.trim` now return
   `String.Slice`; the existing call sites pass the result to APIs
   that still want `String` (`encode tokenizer text`, `nStr.toNat!`).
   Linux didn't catch this because only the macOS job builds
   `bitnet-complete` and `micro-bench`.  Re-materialise the slices
   with `.toString` at the boundary.

2. tutorial-check — `/usr/bin/ld: cannot find -lvulkan` when linking
   libhesper_native.so.  The apt deps list in tutorial-check.yml was
   missing `libvulkan-dev` (the linux-vulkan job in ci.yml has it).
   Added `libvulkan-dev vulkan-tools mesa-vulkan-drivers` to match.

3. Windows — per-module `.dll` links failed with `ld.lld: error:
   undefined symbol: lean_glfw_init` (and ~20 other extern names).
   `nativeDeps` is already a no-op on Windows, so the bridge those
   extern symbols point to was never built.  With
   `precompileModules := true`, Lake still tried to produce a `.dll`
   per Hesper module, and the linker insisted on resolving every
   FFI extern.  Disable `precompileModules` on Windows so only the
   pure-Lean `.olean`s are built — which is all the Windows CI job
   exercises anyway.

Linux + Vulkan was already green (re-verified via gh).
Ch11_DataAnalysis.md imports xeus-lean's `Display` module to render
DataFrame tables as real HTML cells in Jupyter
(`Display.html (df.toHtmlHead ...)`).  That module only exists inside
the xlean kernel, not under plain `lake env lean`:

  docs/tutorial/Notebooks/Gen/Ch11_DataAnalysis.lean:43:0:
    error: unknown module prefix 'Display'

Skip Ch11 from this elaborate gate.  Ch11 is still exercised
end-to-end inside the Docker image, where `xlean-convert --eval`
runs every notebook cell against the real xlean runtime that ships
the Display module.
Ch11 was skipped (commit 2b353f4) because it imports `Display`, which
only resolves inside the xlean kernel.  Ch04 has the same shape of
problem from a different angle: it contains

  #eval runGpu

which calls `Hesper.init` + `getDevice` (FFI externs).  Under plain
`lake env lean` the interpreter can't dlopen the native bridge, so:

  libc++abi: terminating due to uncaught exception of type
    lean::exception: Could not find native implementation of external
    declaration 'Hesper.init' (symbols
    'lp_Hesper_Hesper_init___boxed' or 'lp_Hesper_Hesper_init').

Group both into one `case` so it's clear they share a reason.  The
Docker image still exercises both chapters end-to-end via
`xlean-convert --eval` against the real xlean runtime.
@junjihashimoto junjihashimoto merged commit 09733e1 into main Jun 9, 2026
5 checks passed
@junjihashimoto junjihashimoto deleted the docs/sparkle-style-tutorial branch June 9, 2026 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant