Skip to content

fix: escape backslash before pipe in markdown cell output#163

Open
lml2468 wants to merge 1 commit into
mainfrom
fix/codeql-incomplete-sanitization
Open

fix: escape backslash before pipe in markdown cell output#163
lml2468 wants to merge 1 commit into
mainfrom
fix/codeql-incomplete-sanitization

Conversation

@lml2468
Copy link
Copy Markdown
Contributor

@lml2468 lml2468 commented May 28, 2026

Summary

Fix incomplete string sanitization in scripts/i18n-scan.mjs (CodeQL alerts #9 and #10).

Problem

Two places escape pipe characters | for Markdown table cells but do not first escape backslashes. If input contains \|, the replacement produces \\| which Markdown parses as an escaped backslash + a cell boundary, breaking the table structure.

Fix

Extract a shared escapeMarkdownCell() helper that escapes backslashes before pipes:

function escapeMarkdownCell(text) {
  return text.replace(/\\\\/g, "\\\\\\\\").replace(/\\|/g, "\\\\|");
}

Applied at both call sites (L610, L666).

Verification

Fixes CodeQL #9, #10 (js/incomplete-sanitization, CWE-20/CWE-80/CWE-116)

Fix incomplete string sanitization in i18n-scan.mjs where pipe
characters were escaped without first escaping backslashes. Input
containing '\|' would produce '\|' after replacement, which renders
as an escaped backslash followed by a cell boundary in Markdown.

Extract escapeMarkdownCell() helper that escapes backslashes first,
then pipes, and apply it at both call sites (lines 610 and 666).

Fixes CodeQL alerts #9 and #10 (js/incomplete-sanitization).
@lml2468 lml2468 requested a review from a team as a code owner May 28, 2026 16:01
@github-actions github-actions Bot added the size/XS PR size: XS label May 28, 2026
Copy link
Copy Markdown
Contributor Author

@lml2468 lml2468 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA Review: ✅ LGTM

Diff verified:

  • Correct escaping order: backslash first (/\\\\/g), then pipe (/\\|/g) — matches CodeQL recommendation
  • Helper extracted and reused at both call sites (L611, L667)
  • No unrelated changes, +6/-2 minimal

Waiting for CodeQL re-scan to confirm alerts #9 and #10 auto-close. I will follow up on alert status post-merge.

Copy link
Copy Markdown
Contributor

@yujiawei yujiawei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #163 (octo-web)

Summary

Minimal, correct fix for CodeQL alerts #9 and #10 (js/incomplete-sanitization, CWE-20/80/116) in scripts/i18n-scan.mjs. Two markdown-table cell formatters previously escaped | without first escaping \, so an input containing \| was rendered as \\|, which markdown parses as an escaped backslash followed by an unescaped cell boundary — breaking the table. The fix introduces a shared escapeMarkdownCell() helper and applies it at both call sites.

Verification

Escape ordering is correct

function escapeMarkdownCell(text) {
  return text.replace(/\\/g, "\\\\").replace(/\|/g, "\\|");
}

Backslashes are escaped first, then pipes. This is the canonical order; reversing it would re-introduce the vulnerability because the pipe-escaping step itself emits a backslash that the second pass would then double, leaving the original input's backslash unescaped.

Trace for input a\|b:

  1. \\\: a\\|b
  2. |\|: a\\\|b — markdown renders as \ + |

Trace for input a|b:

  1. no backslashes: a|b
  2. |\|: a\|b — markdown renders as literal |

Trace for input a\b (no pipe):

  1. \\\: a\\b — markdown renders as literal \

Regex literals match intent

  • /\\/g matches a single backslash character, replacement "\\\\" is two backslashes.
  • /\|/g matches a pipe, replacement "\\|" is backslash + pipe.

Both correct.

Call sites

Helper is hoisted at L37, before its uses at L614 (renderMarkdownReport) and L670 (renderLocaleReport). Both formerly called the inline text.replace(/\|/g, "\\|") and now uniformly delegate to the helper. No other places in the file produce markdown table cells from untrusted text — gh api .../i18n-scan.mjs confirms these are the only two cell-rendering sites.

Scope discipline

  • Diff is +6/-2 over a single file.
  • No behavioral change beyond the escaping correction.
  • No new dependencies, no API surface change, no tests required (this is an internal report-rendering script, no existing test scaffolding for it).

Findings

None blocking. The fix is exactly what CodeQL's incomplete-sanitization rule wants and matches the established pattern (escape the escape character before escaping the syntax character).

Nits (non-blocking)

  • The helper could also escape leading/trailing whitespace and newlines, since text may include \n which would also break a table row. Not in scope for this PR (CodeQL did not flag it and current report inputs are unlikely to contain newlines), but worth a follow-up if the inputs ever come from arbitrary source code text.

Verdict

Approve. Clean, surgical CodeQL remediation.

Copy link
Copy Markdown
Contributor

@Jerry-Xin Jerry-Xin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary: This PR is in scope for octo-web and correctly fixes the Markdown table escaping order in the i18n report generator.

💬 Non-blocking

  • 🔵 Suggestion: scripts/i18n-scan.mjs could use a small regression test or fixture for \| input if this script gains test coverage. The current change is simple and correct, but the exact escaping order is easy to regress.

✅ Highlights

  • The relevance gate passes: scripts/i18n-scan.mjs is project i18n tooling, and the PR fixes repository-owned functionality.
  • scripts/i18n-scan.mjs escapes backslashes before pipes, which preserves \| as literal content inside Markdown table cells.
  • Both affected report renderers now use the shared helper at line 614 and line 670, reducing duplicated sanitization logic.

Verification:

  • node --check scripts/i18n-scan.mjs passed.
  • pnpm i18n:check could not run in this checkout because node_modules is missing and typescript could not be resolved. This appears environmental, not caused by the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS PR size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Voice input ignores backend max_duration config — always uses hardcoded 300s default

3 participants