Skip to content

feat(mask): mask Aozora triggers in 4-space indented code blocks #75

@P4suta

Description

@P4suta

Background

crates/afm-markdown/src/code_block_mask.rs masks 青空文庫 trigger
characters (|《》[]※〔〕「」) inside CommonMark fenced code
blocks so they flow through to <pre><code> literally instead of being
rewritten into PUA sentinels by aozora_pipeline. The module docstring
(code_block_mask.rs:30-41) marks indented code blocks (CommonMark
§4.4, code) as deliberately out of scope:

Indented code blocks … start and end based on paragraph context that
the lexer pre-pass would need a full mini-parser to reproduce
(blank-line boundaries, list-item interleaving, etc.). In every Aozora
Bunko source we've seen, code-shaped runs use fenced syntax; the pinned
test tests::indent_of_four_spaces_disables_the_fence codifies the
current behaviour. If a future corpus exhibits real-world 4-space
indented code blocks with Aozora trigger chars, this is the place to
extend.

This is by design, not a bug — there is no known real-world input
that hits it. Filing it so the deferral is tracked rather than buried in
a docstring.

Trigger to act

A corpus document (or user report) where a 4-space indented code block
contains an Aozora trigger character and the trigger is wrongly rewritten
into a sentinel / annotation in the HTML output.

What to do (if/when triggered)

Extend the fence-state machine in mask_code_block_triggers to also
recognise CommonMark §4.4 indented code blocks — which requires tracking
paragraph-interruption context (blank-line boundaries, list-item
interleaving) that the current line-by-line pass deliberately avoids. Add
a regression fixture alongside indent_of_four_spaces_disables_the_fence.

References

  • crates/afm-markdown/src/code_block_mask.rs:30-41
  • crates/afm-markdown/src/code_block_mask.rstests::indent_of_four_spaces_disables_the_fence
  • CLAUDE.md → "HTML post-process edge case"

Metadata

Metadata

Assignees

No one assigned

    Labels

    deferredIntentionally postponed / not-yet-done; backlogenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions