fix: escape backslash before pipe in markdown cell output#163
Conversation
Fix incomplete string sanitization in i18n-scan.mjs where pipe characters were escaped without first escaping backslashes. Input containing '\|' would produce '\|' after replacement, which renders as an escaped backslash followed by a cell boundary in Markdown. Extract escapeMarkdownCell() helper that escapes backslashes first, then pipes, and apply it at both call sites (lines 610 and 666). Fixes CodeQL alerts #9 and #10 (js/incomplete-sanitization).
lml2468
left a comment
There was a problem hiding this comment.
QA Review: ✅ LGTM
Diff verified:
- Correct escaping order: backslash first (
/\\\\/g), then pipe (/\\|/g) — matches CodeQL recommendation - Helper extracted and reused at both call sites (L611, L667)
- No unrelated changes, +6/-2 minimal
Waiting for CodeQL re-scan to confirm alerts #9 and #10 auto-close. I will follow up on alert status post-merge.
yujiawei
left a comment
There was a problem hiding this comment.
Code Review — PR #163 (octo-web)
Summary
Minimal, correct fix for CodeQL alerts #9 and #10 (js/incomplete-sanitization, CWE-20/80/116) in scripts/i18n-scan.mjs. Two markdown-table cell formatters previously escaped | without first escaping \, so an input containing \| was rendered as \\|, which markdown parses as an escaped backslash followed by an unescaped cell boundary — breaking the table. The fix introduces a shared escapeMarkdownCell() helper and applies it at both call sites.
Verification
Escape ordering is correct
function escapeMarkdownCell(text) {
return text.replace(/\\/g, "\\\\").replace(/\|/g, "\\|");
}Backslashes are escaped first, then pipes. This is the canonical order; reversing it would re-introduce the vulnerability because the pipe-escaping step itself emits a backslash that the second pass would then double, leaving the original input's backslash unescaped.
Trace for input a\|b:
\→\\:a\\|b|→\|:a\\\|b— markdown renders as\+|✓
Trace for input a|b:
- no backslashes:
a|b |→\|:a\|b— markdown renders as literal|✓
Trace for input a\b (no pipe):
\→\\:a\\b— markdown renders as literal\✓
Regex literals match intent
/\\/gmatches a single backslash character, replacement"\\\\"is two backslashes./\|/gmatches a pipe, replacement"\\|"is backslash + pipe.
Both correct.
Call sites
Helper is hoisted at L37, before its uses at L614 (renderMarkdownReport) and L670 (renderLocaleReport). Both formerly called the inline text.replace(/\|/g, "\\|") and now uniformly delegate to the helper. No other places in the file produce markdown table cells from untrusted text — gh api .../i18n-scan.mjs confirms these are the only two cell-rendering sites.
Scope discipline
- Diff is +6/-2 over a single file.
- No behavioral change beyond the escaping correction.
- No new dependencies, no API surface change, no tests required (this is an internal report-rendering script, no existing test scaffolding for it).
Findings
None blocking. The fix is exactly what CodeQL's incomplete-sanitization rule wants and matches the established pattern (escape the escape character before escaping the syntax character).
Nits (non-blocking)
- The helper could also escape leading/trailing whitespace and newlines, since
textmay include\nwhich would also break a table row. Not in scope for this PR (CodeQL did not flag it and current report inputs are unlikely to contain newlines), but worth a follow-up if the inputs ever come from arbitrary source code text.
Verdict
Approve. Clean, surgical CodeQL remediation.
Jerry-Xin
left a comment
There was a problem hiding this comment.
Summary: This PR is in scope for octo-web and correctly fixes the Markdown table escaping order in the i18n report generator.
💬 Non-blocking
- 🔵 Suggestion: scripts/i18n-scan.mjs could use a small regression test or fixture for
\|input if this script gains test coverage. The current change is simple and correct, but the exact escaping order is easy to regress.
✅ Highlights
- The relevance gate passes:
scripts/i18n-scan.mjsis project i18n tooling, and the PR fixes repository-owned functionality. - scripts/i18n-scan.mjs escapes backslashes before pipes, which preserves
\|as literal content inside Markdown table cells. - Both affected report renderers now use the shared helper at line 614 and line 670, reducing duplicated sanitization logic.
Verification:
node --check scripts/i18n-scan.mjspassed.pnpm i18n:checkcould not run in this checkout becausenode_modulesis missing andtypescriptcould not be resolved. This appears environmental, not caused by the PR.
Summary
Fix incomplete string sanitization in
scripts/i18n-scan.mjs(CodeQL alerts #9 and #10).Problem
Two places escape pipe characters
|for Markdown table cells but do not first escape backslashes. If input contains\|, the replacement produces\\|which Markdown parses as an escaped backslash + a cell boundary, breaking the table structure.Fix
Extract a shared
escapeMarkdownCell()helper that escapes backslashes before pipes:Applied at both call sites (L610, L666).
Verification
node --check scripts/i18n-scan.mjspasses (no syntax errors)Fixes CodeQL #9, #10 (js/incomplete-sanitization, CWE-20/CWE-80/CWE-116)