Skip to content

Converter P0+P1: tables, images, escape, h4-h6, hr, mark/sub/sup, code lang hints + Vitest harness + CI#1

Merged
dewierwan merged 1 commit into
mainfrom
dewi/converter-p0
May 2, 2026
Merged

Converter P0+P1: tables, images, escape, h4-h6, hr, mark/sub/sup, code lang hints + Vitest harness + CI#1
dewierwan merged 1 commit into
mainfrom
dewi/converter-p0

Conversation

@dewierwan

Copy link
Copy Markdown
Owner

Summary

Aimed at making paste-to-markdown handle real Notion / Airtable / Google Docs pastes without dropping content or silently corrupting it. Plan reference: ~/.claude/plans/great-now-what-else-sprightly-pony.md.

P0 — correctness and core coverage

  • Tables → GFM tables — biggest gap. Walks thead/tbody/tr/th/td, falls back to "first row is header" if no thead, escapes | in cells, recurses for inline formatting.
  • Escape Markdown special chars in text nodes — was silent corruption: "5 * 3 = 15" became italic, "John_Smith" became "JohnSmith", "[draft]" became a broken link. Now escapes \ ` * _ [ ] ~ in text nodes only (not inside <code> / <pre>).
  • Images → ![alt](src) — were dropped entirely. Title attribute included if present.
  • Headings h4–h6 — were falling through to default and emitting plain text.
  • <hr>--- — was dropped.

P1 quick wins

  • <mark>, <sub>, <sup> emit HTML pass-through (renders anywhere Markdown allows raw HTML).
  • <pre> reads data-language (Notion) and class="language-xyz" (Prism convention) for fenced-code language hints.
  • <del> joins <s> / <strike> for strikethrough.
  • Final whitespace pass collapses runs of 3+ blank lines to two (Notion's nested <div>s used to produce big gaps).

P2 — testability + CI

  • Vitest + jsdom harness loads index.html itself as the artifact under test, so tests exercise the actual published page rather than a duplicate. End users still just open index.html — no install needed.
  • 34 tests across headings, inline formatting, escaping, images, tables, code blocks, lists, links, and whitespace cleanup.
  • Fixture auto-discovery — drop tests/fixtures/<source>/<scenario>.html paired with <scenario>.expected.md and the harness picks them up. Highest-leverage investment for regression safety as new sources are supported.
  • GitHub Actions CI — runs cspell, html-validate, and vitest on every PR.

Out of scope for this PR (P3, design-coordinated)

  • "Paste as plain text" toggle
  • Source-detection pill
  • Don't-auto-clear input behaviour
  • Google Docs class-based formatting (currently relies on inline-style fallback)

Test plan

  • npm test — all 34 tests pass locally
  • npm run lint:spell — clean
  • npm run lint:html — clean
  • CI green on this PR
  • Live site smoke test: paste a Notion page with a table, image, h4, hr, and */_/[ in prose; confirm output is correct

🤖 Generated with Claude Code

P0 (correctness, conversion coverage):
- Tables → GFM table syntax (handles thead-or-first-row header, escapes pipes)
- Markdown special chars in text nodes are now escaped so "5 * 3 = 15" no longer renders as italic
- Images → ![alt](src) (with optional title)
- Headings extended to h4–h6
- <hr> → ---

P1 quick wins:
- <mark>, <sub>, <sup> emit HTML pass-through (renders everywhere Markdown allows raw HTML)
- <pre>/<code> reads data-language and class="language-xyz" for fenced-code language hints
- <del> joins <s>/<strike> for strikethrough
- Final whitespace pass collapses runs of 3+ blank lines to two

P2 (testability + CI):
- Vitest + jsdom harness loads index.html as the artifact under test, so tests exercise
  the actual published page rather than a duplicate. End users still just open index.html.
- 34 unit tests covering headings, inline formatting, escaping, images, tables, code blocks,
  lists, links, and whitespace cleanup.
- tests/fixtures/<source>/*.html ⇄ *.expected.md auto-discovery for real-world paste regression coverage
- GitHub Actions CI runs cspell, html-validate, and vitest on PRs

README updated with full feature list, privacy note, and contributor instructions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dewierwan

Copy link
Copy Markdown
Owner Author

@claude review

@dewierwan dewierwan merged commit acdedee into main May 2, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant