perf(content): render markdown with sparkdown-gfm wasm instead of marked#10
Conversation
entry.html now renders via @momiji-rs/sparkdown-gfm (WASI-free wasm, CommonMark + GFM) instead of
marked. Benchmarked on the real docs corpus (Bun): ~75x faster on small pages, ~580x on large pages —
marked degrades super-linearly (a 27KB page took ~130ms vs ~0.22ms), the SSG hot path. GFM (tables,
strikethrough, task lists, autolinks) renders at no extra cost vs plain CommonMark. Output is
CommonMark-strict: bare <h2> headings (Kura's anchor post-processor depends on this), language-* code
fences, and a literal {…} stays text. The wasm inits once per process via a top-level await; content.ts
is build/dev-only (imports node:fs) so it never reaches the worker bundle. marked dependency removed.
Verified: 297 @junejs/server tests pass incl. 7 new GFM-render regression tests; direct render of a
GFM doc through the pipeline confirmed correct.
There was a problem hiding this comment.
Pull request overview
This PR switches June’s content-pipeline Markdown renderer for entry.html from marked to @momiji-rs/sparkdown-gfm (WASM), aiming to significantly speed up the june gen hot path while preserving key output contracts relied on by downstream HTML processing.
Changes:
- Replace
markedrendering withsparkdown-gfm’stoHtmlSync()inpackages/june/src/content.ts, initializing the WASM runtime via top-levelawait init(). - Add regression tests asserting required CommonMark+GFM output behaviors (tables, strikethrough, task lists, autolinks, bare headings,
language-*fences, literal{…}). - Update dependencies/lockfile and add a patch changeset documenting the renderer swap.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| packages/june/src/content.ts | Swap renderer implementation to sparkdown-gfm and initialize the WASM runtime. |
| packages/june/test/content.test.ts | Add regression tests to lock in required HTML output contracts. |
| packages/june/package.json | Add @momiji-rs/sparkdown-gfm dependency and remove marked. |
| bun.lock | Lockfile updates reflecting dependency changes. |
| .changeset/sparkdown-gfm-renderer.md | Patch changeset describing the renderer swap and guarantees. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Instantiate the markdown→HTML wasm once, up front. loadEntry() renders synchronously (it runs from | ||
| // sync collection scanners and from dev request handlers), and toHtmlSync requires the wasm to be | ||
| // ready. This module is build/dev-only — it imports node:fs and is never in the frozen worker bundle — | ||
| // so a top-level await is safe here (it never reaches workerd). Init is ~0.3ms, once per process. | ||
| await init(); |
There was a problem hiding this comment.
Adopted — deferred to an upstream fix. You're right: the top-level await init() makes importing @junejs/server perform wasm init as a side effect (index.ts re-exports collection/entry from ./content), so db/adapter-only paths pay it too.
The clean fix keeps the content API synchronous rather than turning it async: add a synchronous initSync() to @momiji-rs/sparkdown-gfm (a sync new WebAssembly.Module is fine for a ~290KB module on Node/Bun — the 4KB sync-compile cap is browser-main-thread only), then lazy-init on first render inside loadEntry — no top-level await, no import side effect, sync API preserved.
That initSync() is being added upstream now. Once it ships I'll drop the top-level await and switch content.ts to lazy sync init, then resolve this thread. Holding it open until then.
There was a problem hiding this comment.
Fixed in 5d5c4e0. sparkdown-gfm 0.0.3 shipped the synchronous initSync(), so content.ts no longer uses a top-level await: it imports { toHtmlSync, initSync } and initializes the wasm lazily on the FIRST render inside loadEntry (sync, idempotent guard). Importing @junejs/server now has no side effect — db/adapter-only paths never trigger wasm init — and the content API stays synchronous. Verified: typecheck clean, 297 tests pass, GFM render correct, no top-level await remains.
Copilot review: the top-level await init() made importing @junejs/server (which re-exports collection/ entry from ./content) instantiate the wasm as a side effect, so db/adapter-only paths paid/risked it. sparkdown-gfm 0.0.3 adds a synchronous initSync(); use it to initialize lazily on the FIRST render inside loadEntry (sync, idempotent) instead. Importing the module now has no side effect; the content API stays synchronous. Verified: typecheck clean, 297 tests pass, GFM render correct, no top-level await.
| // The markdown→HTML wasm is initialized LAZILY on the first render, synchronously. loadEntry() is sync | ||
| // (it runs from sync collection scanners and dev request handlers) and toHtmlSync needs the wasm ready; | ||
| // initSync() instantiates it without an await (valid on Node/Bun — this module is build/dev-only, imports | ||
| // node:fs, and never reaches the browser/worker bundle). Lazy + sync means importing this file has NO | ||
| // side effect: db/adapter-only code paths that never render markdown don't trigger (or risk) wasm init. | ||
| // Idempotent, ~0.3ms, once per process. |
There was a problem hiding this comment.
Good catch — adopted. Updated the PR description: the Implementation section now states the wasm initializes lazily and synchronously via sparkdown-gfm 0.0.3's initSync() inside loadEntry (no import-time side effect), and notes the initial top-level await init() was changed during review. Description-only change; the code already matches (5d5c4e0).
| test("code fence keeps the language class", () => expect(html).toContain('<code class="language-ts">')); | ||
| test("headings stay bare (no injected id/class)", () => expect(html).toContain("<h2>Section</h2>")); |
There was a problem hiding this comment.
Partial — adopted for the code fence, deliberately kept strict for the heading (2730cda).
- Code fence: agreed, loosened to /<code class="[^"]\blanguage-ts\b/ so additional classes/whitespace don't cause a false failure (the contract is just 'a language- class is present').
- Heading: kept strict on purpose, because here 'bare' IS the contract — Kura's processHtml anchor regex is /<h([23])>/, which only matches an h2/h3 with NO attributes. A loose match (e.g. <h2[^>]*>) would wrongly PASS for a contract-breaking
and silently break Kura's deep-link anchors. So I also added expect(html).not.toMatch(/<h2\s/) to assert no h2 carries attributes — making the no-attributes contract explicit rather than looser.
…NO attrs
Per review: the code-fence assertion was stricter than the contract (a language-* class is present) —
match it with a regex that tolerates additional classes. The heading assertion stays strict on purpose
('bare' IS the contract: Kura's processHtml anchor regex /<h([23])>/ only matches an h2/h3 with no
attributes), and now also asserts no h2 carries attributes — a loose match would wrongly pass a
contract-breaking <h2 id=…>.
…4 subpath) @momiji-rs/sparkdown 0.0.4 merged the two packages: the CommonMark renderer is the root export and GFM is the ./gfm subpath. Switch the dependency from @momiji-rs/sparkdown-gfm to @momiji-rs/sparkdown@^0.0.4 and import from the /gfm subpath. Same initSync/toHtmlSync API and output. Verified: typecheck clean, 297 tests pass, GFM render correct.
Swaps June's content-pipeline markdown renderer from marked to @momiji-rs/sparkdown-gfm (WASI-free WebAssembly, CommonMark + GFM).
Why
entry.htmlis rendered for every content entry duringjune gen— the SSG hot path. marked degrades super-linearly on Bun:→ ~75× faster (small), ~580× faster (large), ~5× faster than markdown-it. And GFM is free — sparkdown-gfm with tables/strikethrough/task-lists/autolinks benches identical to plain CommonMark (8 µs / 0.22 ms).
Output contracts (verified through the pipeline)
<table>,<del>, task-list<input type=checkbox>, bare-URL<a href>.<h2>(no injected id/class) — Kura'sprocessHtmlanchor regex depends on this.language-*code fences preserved.{…}stays literal text (the MDX expression footgun never applied to plain markdown, and stays gone).Implementation
content.ts). marked dependency removed.initSync()insideloadEntry(idempotent guard) — not a top-level await.content.tsis build/dev-only (importsnode:fs, never in the worker bundle), and importing it has no import-time side effect: db/adapter-only paths never trigger (or risk) wasm init. The content API stays synchronous. (An initial revision used a top-levelawait init(); changed to lazyinitSync()during review to remove the import side effect — see thread.)Verification
collection()confirmed correct; no top-level await remains.Ships via a
patchchangeset.