Skip to content

Commit 6089212

Browse files
unamedkrclaude
andcommitted
wasm: fix generation hang — add ChatML template + async generate
Two bugs causing WASM demo to hang after model loads: 1. **Missing chat template**: user prompt sent raw ("hello?") without ChatML wrapping. SmolLM2-Instruct generates 0 tokens without the <|im_start|>user/assistant template. Same bug we fixed in Python bindings (v0.8.3). Fix: JS wraps prompt with ChatML before calling wasm_generate. 2. **UI freeze**: wasm_generate is synchronous (blocks main thread). The browser can't update the UI while inference runs. Fix: wrap the WASM call in setTimeout(50ms) to yield one frame for the spinner. Also fixed: free(result) → quant_free_string(result) for consistency with the cross-heap safety pattern, and better empty-result handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 618845b commit 6089212

File tree

2 files changed

+19
-12
lines changed

2 files changed

+19
-12
lines changed

wasm/index.html

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -300,16 +300,21 @@ <h2>LLM in Your Browser — 189 KB</h2>
300300
addMessage('system', msg);
301301
};
302302

303-
// Call generation
304-
const promptPtr = Module.allocateUTF8(text);
305-
Module._wasm_generate(promptPtr, 0.7, 256);
306-
Module._free(promptPtr);
303+
// Wrap with ChatML template (instruct models need this to generate)
304+
const chatPrompt = `<|im_start|>user\n${text}<|im_end|>\n<|im_start|>assistant\n`;
307305

308-
if (!output) {
309-
assistantDiv.innerHTML = '<em style="color:#666">No output generated</em>';
310-
}
311-
generating = false;
312-
document.getElementById('sendBtn').disabled = false;
306+
// Run generation asynchronously so the UI doesn't freeze
307+
setTimeout(() => {
308+
const promptPtr = Module.allocateUTF8(chatPrompt);
309+
Module._wasm_generate(promptPtr, 0.7, 256);
310+
Module._free(promptPtr);
311+
312+
if (!output) {
313+
assistantDiv.innerHTML = '<em style="color:#666">No output generated. Try a longer prompt.</em>';
314+
}
315+
generating = false;
316+
document.getElementById('sendBtn').disabled = false;
317+
}, 50); // yield to browser for one frame to show the spinner
313318
}
314319
</script>
315320

wasm/quant_wasm.c

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -119,14 +119,16 @@ int wasm_generate(const char* prompt, float temperature, int max_tokens) {
119119

120120
double elapsed = emscripten_get_now() - t0;
121121

122-
if (result) {
122+
if (result && result[0] != '\0') {
123123
/* Send full result (quant_ask doesn't use callback) */
124124
js_on_token(result);
125125
int n_tokens = (int)strlen(result) / 4; /* rough estimate */
126126
js_on_done(n_tokens, elapsed);
127-
free(result);
127+
quant_free_string(result);
128128
} else {
129-
js_on_status("Generation failed");
129+
if (result) quant_free_string(result);
130+
js_on_done(0, elapsed);
131+
js_on_status("No output — try a different prompt");
130132
}
131133

132134
g_generating = 0;

0 commit comments

Comments
 (0)