Background
crates/afm-markdown/src/code_block_mask.rs masks 青空文庫 trigger
characters (|《》[]※〔〕「」) inside CommonMark fenced code
blocks so they flow through to <pre><code> literally instead of being
rewritten into PUA sentinels by aozora_pipeline. The module docstring
(code_block_mask.rs:30-41) marks indented code blocks (CommonMark
§4.4, code) as deliberately out of scope:
Indented code blocks … start and end based on paragraph context that
the lexer pre-pass would need a full mini-parser to reproduce
(blank-line boundaries, list-item interleaving, etc.). In every Aozora
Bunko source we've seen, code-shaped runs use fenced syntax; the pinned
test tests::indent_of_four_spaces_disables_the_fence codifies the
current behaviour. If a future corpus exhibits real-world 4-space
indented code blocks with Aozora trigger chars, this is the place to
extend.
This is by design, not a bug — there is no known real-world input
that hits it. Filing it so the deferral is tracked rather than buried in
a docstring.
Trigger to act
A corpus document (or user report) where a 4-space indented code block
contains an Aozora trigger character and the trigger is wrongly rewritten
into a sentinel / annotation in the HTML output.
What to do (if/when triggered)
Extend the fence-state machine in mask_code_block_triggers to also
recognise CommonMark §4.4 indented code blocks — which requires tracking
paragraph-interruption context (blank-line boundaries, list-item
interleaving) that the current line-by-line pass deliberately avoids. Add
a regression fixture alongside indent_of_four_spaces_disables_the_fence.
References
crates/afm-markdown/src/code_block_mask.rs:30-41
crates/afm-markdown/src/code_block_mask.rs → tests::indent_of_four_spaces_disables_the_fence
- CLAUDE.md → "HTML post-process edge case"
Background
crates/afm-markdown/src/code_block_mask.rsmasks 青空文庫 triggercharacters (
|《》[]※〔〕「」) inside CommonMark fenced codeblocks so they flow through to
<pre><code>literally instead of beingrewritten into PUA sentinels by
aozora_pipeline. The module docstring(code_block_mask.rs:30-41) marks indented code blocks (CommonMark
§4.4,
code) as deliberately out of scope:This is by design, not a bug — there is no known real-world input
that hits it. Filing it so the deferral is tracked rather than buried in
a docstring.
Trigger to act
A corpus document (or user report) where a 4-space indented code block
contains an Aozora trigger character and the trigger is wrongly rewritten
into a sentinel / annotation in the HTML output.
What to do (if/when triggered)
Extend the fence-state machine in
mask_code_block_triggersto alsorecognise CommonMark §4.4 indented code blocks — which requires tracking
paragraph-interruption context (blank-line boundaries, list-item
interleaving) that the current line-by-line pass deliberately avoids. Add
a regression fixture alongside
indent_of_four_spaces_disables_the_fence.References
crates/afm-markdown/src/code_block_mask.rs:30-41crates/afm-markdown/src/code_block_mask.rs→tests::indent_of_four_spaces_disables_the_fence