fix: escape inline HTML in markdown to eliminate XSS attack surface by chungjac · Pull Request #484 · aws/mynah-ui

chungjac · 2026-05-07T21:12:20Z

Summary

Adds html and image custom renderers in marked that escape raw HTML tokens instead of passing them through to the DOM. This eliminates the entire class of inline-HTML XSS vectors at the parser level, regardless of which tags or attributes are involved.
Removes resource-loading media tags (img, audio, video, source, track) from the sanitizer allowlist as defense-in-depth

Motivation

After PRs #462, #466, and #470 were merged, VAPT verification found that <img> tags in attacker-controlled filenames still rendered as live HTML because img was on the sanitizer's allowlist. The allowlist approach requires ongoing maintenance — every missed tag/attribute combo is a potential bypass (e.g. <input type="image" src=...>, style="background-image: url(...)", etc).

This PR follows the principle stated by the security reviewer: "any output resembling an HTML tag should be escaped before reaching the browser, regardless of its source."

Legitimate formatting uses markdown syntax (**bold**, [link](url), `code`), which marked converts structurally. Raw HTML in the input is never intentional — it's either untrusted data (like filenames) or LLM output errors.

How the fix works

All raw HTML tokens in markdown output are escaped at the marked renderer level, before they ever reach the DOM.

For example, if the LLM outputs <img src="https://attacker.com/exfil"> in its response:

	Before fix	After fix
What enters the DOM	`<img src="https://attacker.com/exfil">` (live HTML element)	`<img src="https://attacker.com/exfil">` (text node)
What the user sees	Nothing (or broken image icon)	`<img src="https://attacker.com/exfil">` as visible text
Network requests	Browser fetches the URL automatically	None
JS execution	`onerror`/`onload` handlers fire	None

This applies to all inline HTML regardless of tag — <img>, <svg>, <script>, <input>, <div>, etc. Markdown formatting (**bold**,
[link](url), `code`) continues to render normally since those go through separate renderer paths.

Verified in browser

Zero outbound network requests to attacker URLs
document.title unchanged (no JS executed)
All payloads render as inert, readable text

Before / After

Before fix — raw HTML renders as live DOM elements

Malicious filenames execute JavaScript (onerror fires), render SVG content, make outbound network requests, and disappear from visible text:

After fix — all HTML escaped to inert visible text

The same payloads display as safe, readable text with no code execution or network requests:

Verification results

Check	Result
`<img src=x onerror=alert(1)>` executes JS?	No — rendered as text
`<img src="https://attacker.com/exfil">` makes network request?	No — zero outbound requests
`<svg onload=...>` executes JS?	No — rendered as text
`<input type="image" src="...">` fetches URL?	No — rendered as text
`<div style="background-image: url(...)">` fetches URL?	No — rendered as text
`bold`, `[link](url)`, `code` still render?	Yes — markdown formatting unaffected

Test plan

All existing unit tests pass (17/17)
New tests verify: <img>, <svg>, <script>, <div> with event handlers, ![image](url) are all escaped
Markdown formatting still renders correctly
Live browser verification: zero network requests to attacker URLs, no JS execution
UI snapshot tests need update (binary PNGs will change since images no longer render)

Instead of relying solely on sanitize-html's allowlist to filter dangerous tags (which requires ongoing maintenance as new bypass vectors are found), escape all raw HTML tokens at the marked renderer level. This ensures any output resembling an HTML tag is rendered as inert text regardless of source. Also removes resource-loading media tags (img, audio, video, source, track) from the sanitizer allowlist as defense-in-depth.

chungjac requested a review from a team as a code owner May 7, 2026 21:12

chungjac added 5 commits May 7, 2026 14:27

docs: add before/after screenshots for XSS fix demonstration

89e2e16

remove screenshots from repo

6123b23

fix: update e2e snapshots and code reference offset

c96d899

fix: correct code reference offset and update snapshot

6272e82

fix: update parse-markdown snapshots

09a418b

aseemxs approved these changes May 7, 2026

View reviewed changes

ashishrp-aws approved these changes May 7, 2026

View reviewed changes

chungjac merged commit 571d6c2 into aws:main May 7, 2026
4 checks passed

chungjac mentioned this pull request May 7, 2026

chore: bump @aws/mynah-ui to ^4.40.2 aws/language-servers#2719

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: escape inline HTML in markdown to eliminate XSS attack surface#484

fix: escape inline HTML in markdown to eliminate XSS attack surface#484
chungjac merged 6 commits intoaws:mainfrom
chungjac:fix/escape-inline-html-xss

chungjac commented May 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chungjac commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

How the fix works

Verified in browser

Before / After

Before fix — raw HTML renders as live DOM elements

After fix — all HTML escaped to inert visible text

Verification results

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chungjac commented May 7, 2026 •

edited

Loading