St abbrs by vr8hub · Pull Request #925 · standardebooks/tools

vr8hub · 2026-02-07T22:02:10Z

As discussed.

Typogrify looks for St. abbreviations followed by whitespace, and if they're a) followed by a capital letter and b) not preceded by an ordinal, then it changes the whitespace to a no-break space.
Semanticate looks for St. abbreviations followed by a no-break space and adds an abbr element with a z3998:name-title tag. The code later to add a plain abbr tag to all untagged St.'s is left unchanged, so any remaining St.'s that don't have abbrs will get one.

If typogrify tags a false positive and it happens to be caught and corrected (unlikely), then semanticate won't tag it with name-title. If it's not caught and semanticate tags the false positive, and that is subsequently manually corrected to remove the name-title tag (but not the `abbr tag) and change no-break space to a regular space, neither typogrify nor semanticate will mess with it again.

If semanticate is mistakenly run before typogrify, then all the St.'s get a plain tag and no trailing no-break space. The first is no worse than today, and the second is only partially worse. And it's easy to fix; just replace <abbr>St.</abbr> with St. and re-run typogrify/semanticate.

In a separate commit, I finally tracked down something that has been a problem for quite a while (years?). If there's some all uppercase text, and that text happens to have, e.g., compass directions in it, e.g. NEARER THE SEWER, then they get abbr elements and tags, even though they're in the middle of a word. (This happened while I was testing semanticate, so I both encountered the problem and was in the code, so I went ahead and investigated). It turns out that in three instances of looking for all caps abbreviations, the ending negative lookahead assertion was (?!\b), i.e. it can't be followed by a word break. That should clearly be (?!\B), i.e. it can't be followed by a anything BUT a word break. This matches the negative lookahead at the beginning of the regex, which is already (?<!\B).

There were three instances of that, and I changed them all, and added a test to the semanticate test file for it, along with tests for the St. abbrs.

…it's for Saint

…pace (added by typogrify)

acabal · 2026-02-13T03:06:57Z

Excellent work, thanks!

vr8hub added 4 commits February 7, 2026 15:51

typogrify: Attempt to only add no-break space following St. abbrs if …

423b906

…it's for Saint

semanticate: Add z3998:name-title tag to St. followed by a no-break s…

a1053e6

…pace (added by typogrify)

semanticate: Correct negative lookahead on a few regexes

5a5c386

Update semanticate tests for both St and all-cap abbrs

c1cae9f

acabal merged commit 76fbdcb into standardebooks:master Feb 13, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

St abbrs#925

St abbrs#925
acabal merged 4 commits intostandardebooks:masterfrom
vr8hub:st_abbrs

vr8hub commented Feb 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

acabal commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vr8hub commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

acabal commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vr8hub commented Feb 7, 2026 •

edited

Loading