Skip to content

Improve Korean keep-all mixed-script breaks#124

Open
doyoon530 wants to merge 1 commit intochenglou:mainfrom
doyoon530:korean-keep-all-mixed-script
Open

Improve Korean keep-all mixed-script breaks#124
doyoon530 wants to merge 1 commit intochenglou:mainfrom
doyoon530:korean-keep-all-mixed-script

Conversation

@doyoon530
Copy link
Copy Markdown

@doyoon530 doyoon530 commented Apr 11, 2026

Summary

  • Improves wordBreak: keep-all for Korean-heavy mixed-script tokens.
  • Keeps Hangul + ASCII/digit product-like runs together, e.g. AI정보공학과, README카드생성기, api문서v2가이드, 2026학년도공지.
  • Stops keep-all continuation across path/query/key-value separators like /, ?, &, =, and :. These separators are treated as structural boundaries so URL/path/query/key-value text does not get folded into one Korean token.
  • Adds regression tests for compact Korean tokens, separator boundaries, punctuation, and ZWSP.

Before / After

Before, ASCII-leading Korean tokens could stay split at the script boundary:

  • AI정보공학과 -> AI + 정보공학과
  • 2026학년도공지 -> 2026 + 학년도공지

After, compact Korean mixed-script tokens stay together under keep-all, while structured separators remain break boundaries:

  • AI정보공학과 stays as one text unit
  • key=value한글 remains split around =
  • docs/README한글가이드 remains split around /

Testing

  • bun test src/layout.test.ts
  • bun run check
  • bun test

@doyoon530 doyoon530 force-pushed the korean-keep-all-mixed-script branch from 8051339 to c74447e Compare April 11, 2026 11:45
@doyoon530 doyoon530 marked this pull request as ready for review April 11, 2026 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant