Skip to content

perf(content): render markdown with sparkdown-gfm wasm instead of marked#10

Merged
linyiru merged 4 commits into
mainfrom
feat/sparkdown-gfm-renderer
Jun 25, 2026
Merged

perf(content): render markdown with sparkdown-gfm wasm instead of marked#10
linyiru merged 4 commits into
mainfrom
feat/sparkdown-gfm-renderer

Conversation

@linyiru

@linyiru linyiru commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Swaps June's content-pipeline markdown renderer from marked to @momiji-rs/sparkdown-gfm (WASI-free WebAssembly, CommonMark + GFM).

Why

entry.html is rendered for every content entry during june gen — the SSG hot path. marked degrades super-linearly on Bun:

corpus marked (gfm) markdown-it sparkdown-gfm
small page (481 B) 616 µs 21 µs 8 µs
large page (27 KB) 130 ms 1.10 ms 0.22 ms

~75× faster (small), ~580× faster (large), ~5× faster than markdown-it. And GFM is free — sparkdown-gfm with tables/strikethrough/task-lists/autolinks benches identical to plain CommonMark (8 µs / 0.22 ms).

Output contracts (verified through the pipeline)

  • GFM renders: <table>, <del>, task-list <input type=checkbox>, bare-URL <a href>.
  • Bare <h2> (no injected id/class) — Kura's processHtml anchor regex depends on this.
  • language-* code fences preserved.
  • A bare {…} stays literal text (the MDX expression footgun never applied to plain markdown, and stays gone).

Implementation

  • Render swap at the single call site (content.ts). marked dependency removed.
  • The wasm initializes lazily and synchronously on the first render — via sparkdown-gfm 0.0.3's initSync() inside loadEntry (idempotent guard) — not a top-level await. content.ts is build/dev-only (imports node:fs, never in the worker bundle), and importing it has no import-time side effect: db/adapter-only paths never trigger (or risk) wasm init. The content API stays synchronous. (An initial revision used a top-level await init(); changed to lazy initSync() during review to remove the import side effect — see thread.)

Verification

  • 297 @junejs/server tests pass (incl. 7 GFM-render regression tests).
  • typecheck clean; direct render of a GFM doc through collection() confirmed correct; no top-level await remains.

Ships via a patch changeset.

entry.html now renders via @momiji-rs/sparkdown-gfm (WASI-free wasm, CommonMark + GFM) instead of
marked. Benchmarked on the real docs corpus (Bun): ~75x faster on small pages, ~580x on large pages —
marked degrades super-linearly (a 27KB page took ~130ms vs ~0.22ms), the SSG hot path. GFM (tables,
strikethrough, task lists, autolinks) renders at no extra cost vs plain CommonMark. Output is
CommonMark-strict: bare <h2> headings (Kura's anchor post-processor depends on this), language-* code
fences, and a literal {…} stays text. The wasm inits once per process via a top-level await; content.ts
is build/dev-only (imports node:fs) so it never reaches the worker bundle. marked dependency removed.

Verified: 297 @junejs/server tests pass incl. 7 new GFM-render regression tests; direct render of a
GFM doc through the pipeline confirmed correct.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR switches June’s content-pipeline Markdown renderer for entry.html from marked to @momiji-rs/sparkdown-gfm (WASM), aiming to significantly speed up the june gen hot path while preserving key output contracts relied on by downstream HTML processing.

Changes:

  • Replace marked rendering with sparkdown-gfm’s toHtmlSync() in packages/june/src/content.ts, initializing the WASM runtime via top-level await init().
  • Add regression tests asserting required CommonMark+GFM output behaviors (tables, strikethrough, task lists, autolinks, bare headings, language-* fences, literal {…}).
  • Update dependencies/lockfile and add a patch changeset documenting the renderer swap.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
packages/june/src/content.ts Swap renderer implementation to sparkdown-gfm and initialize the WASM runtime.
packages/june/test/content.test.ts Add regression tests to lock in required HTML output contracts.
packages/june/package.json Add @momiji-rs/sparkdown-gfm dependency and remove marked.
bun.lock Lockfile updates reflecting dependency changes.
.changeset/sparkdown-gfm-renderer.md Patch changeset describing the renderer swap and guarantees.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/june/src/content.ts Outdated
Comment on lines +16 to +20
// Instantiate the markdown→HTML wasm once, up front. loadEntry() renders synchronously (it runs from
// sync collection scanners and from dev request handlers), and toHtmlSync requires the wasm to be
// ready. This module is build/dev-only — it imports node:fs and is never in the frozen worker bundle —
// so a top-level await is safe here (it never reaches workerd). Init is ~0.3ms, once per process.
await init();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopted — deferred to an upstream fix. You're right: the top-level await init() makes importing @junejs/server perform wasm init as a side effect (index.ts re-exports collection/entry from ./content), so db/adapter-only paths pay it too.

The clean fix keeps the content API synchronous rather than turning it async: add a synchronous initSync() to @momiji-rs/sparkdown-gfm (a sync new WebAssembly.Module is fine for a ~290KB module on Node/Bun — the 4KB sync-compile cap is browser-main-thread only), then lazy-init on first render inside loadEntry — no top-level await, no import side effect, sync API preserved.

That initSync() is being added upstream now. Once it ships I'll drop the top-level await and switch content.ts to lazy sync init, then resolve this thread. Holding it open until then.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5d5c4e0. sparkdown-gfm 0.0.3 shipped the synchronous initSync(), so content.ts no longer uses a top-level await: it imports { toHtmlSync, initSync } and initializes the wasm lazily on the FIRST render inside loadEntry (sync, idempotent guard). Importing @junejs/server now has no side effect — db/adapter-only paths never trigger wasm init — and the content API stays synchronous. Verified: typecheck clean, 297 tests pass, GFM render correct, no top-level await remains.

Copilot review: the top-level await init() made importing @junejs/server (which re-exports collection/
entry from ./content) instantiate the wasm as a side effect, so db/adapter-only paths paid/risked it.
sparkdown-gfm 0.0.3 adds a synchronous initSync(); use it to initialize lazily on the FIRST render
inside loadEntry (sync, idempotent) instead. Importing the module now has no side effect; the content
API stays synchronous. Verified: typecheck clean, 297 tests pass, GFM render correct, no top-level await.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

Comment on lines +16 to +21
// The markdown→HTML wasm is initialized LAZILY on the first render, synchronously. loadEntry() is sync
// (it runs from sync collection scanners and dev request handlers) and toHtmlSync needs the wasm ready;
// initSync() instantiates it without an await (valid on Node/Bun — this module is build/dev-only, imports
// node:fs, and never reaches the browser/worker bundle). Lazy + sync means importing this file has NO
// side effect: db/adapter-only code paths that never render markdown don't trigger (or risk) wasm init.
// Idempotent, ~0.3ms, once per process.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — adopted. Updated the PR description: the Implementation section now states the wasm initializes lazily and synchronously via sparkdown-gfm 0.0.3's initSync() inside loadEntry (no import-time side effect), and notes the initial top-level await init() was changed during review. Description-only change; the code already matches (5d5c4e0).

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

Comment thread packages/june/test/content.test.ts Outdated
Comment on lines +222 to +223
test("code fence keeps the language class", () => expect(html).toContain('<code class="language-ts">'));
test("headings stay bare (no injected id/class)", () => expect(html).toContain("<h2>Section</h2>"));

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial — adopted for the code fence, deliberately kept strict for the heading (2730cda).

  • Code fence: agreed, loosened to /<code class="[^"]\blanguage-ts\b/ so additional classes/whitespace don't cause a false failure (the contract is just 'a language- class is present').
  • Heading: kept strict on purpose, because here 'bare' IS the contract — Kura's processHtml anchor regex is /<h([23])>/, which only matches an h2/h3 with NO attributes. A loose match (e.g. <h2[^>]*>) would wrongly PASS for a contract-breaking

    and silently break Kura's deep-link anchors. So I also added expect(html).not.toMatch(/<h2\s/) to assert no h2 carries attributes — making the no-attributes contract explicit rather than looser.

…NO attrs

Per review: the code-fence assertion was stricter than the contract (a language-* class is present) —
match it with a regex that tolerates additional classes. The heading assertion stays strict on purpose
('bare' IS the contract: Kura's processHtml anchor regex /<h([23])>/ only matches an h2/h3 with no
attributes), and now also asserts no h2 carries attributes — a loose match would wrongly pass a
contract-breaking <h2 id=…>.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated no new comments.

…4 subpath)

@momiji-rs/sparkdown 0.0.4 merged the two packages: the CommonMark renderer is the root export and GFM
is the ./gfm subpath. Switch the dependency from @momiji-rs/sparkdown-gfm to @momiji-rs/sparkdown@^0.0.4
and import from the /gfm subpath. Same initSync/toHtmlSync API and output. Verified: typecheck clean,
297 tests pass, GFM render correct.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated no new comments.

@linyiru linyiru merged commit 1470eaf into main Jun 25, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants