fix: escape inline HTML in markdown to eliminate XSS attack surface#484
Merged
fix: escape inline HTML in markdown to eliminate XSS attack surface#484
Conversation
Instead of relying solely on sanitize-html's allowlist to filter dangerous tags (which requires ongoing maintenance as new bypass vectors are found), escape all raw HTML tokens at the marked renderer level. This ensures any output resembling an HTML tag is rendered as inert text regardless of source. Also removes resource-loading media tags (img, audio, video, source, track) from the sanitizer allowlist as defense-in-depth.
aseemxs
approved these changes
May 7, 2026
ashishrp-aws
approved these changes
May 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
htmlandimagecustom renderers inmarkedthat escape raw HTML tokens instead of passing them through to the DOM. This eliminates the entire class of inline-HTML XSS vectors at the parser level, regardless of which tags or attributes are involved.img,audio,video,source,track) from the sanitizer allowlist as defense-in-depthMotivation
After PRs #462, #466, and #470 were merged, VAPT verification found that
<img>tags in attacker-controlled filenames still rendered as live HTML becauseimgwas on the sanitizer's allowlist. The allowlist approach requires ongoing maintenance — every missed tag/attribute combo is a potential bypass (e.g.<input type="image" src=...>,style="background-image: url(...)", etc).This PR follows the principle stated by the security reviewer: "any output resembling an HTML tag should be escaped before reaching the browser, regardless of its source."
Legitimate formatting uses markdown syntax (
**bold**,[link](url),`code`), whichmarkedconverts structurally. Raw HTML in the input is never intentional — it's either untrusted data (like filenames) or LLM output errors.How the fix works
All raw HTML tokens in markdown output are escaped at the
markedrenderer level, before they ever reach the DOM.For example, if the LLM outputs
<img src="https://attacker.com/exfil">in its response:<img src="https://attacker.com/exfil">(live HTML element)<img src="https://attacker.com/exfil">(text node)<img src="https://attacker.com/exfil">as visible textonerror/onloadhandlers fireThis applies to all inline HTML regardless of tag —
<img>,<svg>,<script>,<input>,<div>, etc. Markdown formatting (**bold**,[link](url),`code`) continues to render normally since those go through separate renderer paths.Verified in browser
document.titleunchanged (no JS executed)Before / After
Before fix — raw HTML renders as live DOM elements
Malicious filenames execute JavaScript (
onerrorfires), render SVG content, make outbound network requests, and disappear from visible text:After fix — all HTML escaped to inert visible text
The same payloads display as safe, readable text with no code execution or network requests:
Verification results
<img src=x onerror=alert(1)>executes JS?<img src="https://attacker.com/exfil">makes network request?<svg onload=...>executes JS?<input type="image" src="...">fetches URL?<div style="background-image: url(...)">fetches URL?**bold**,[link](url),`code`still render?Test plan
<img>,<svg>,<script>,<div>with event handlers,are all escaped