Skip to content

St abbrs#925

Merged
acabal merged 4 commits intostandardebooks:masterfrom
vr8hub:st_abbrs
Feb 13, 2026
Merged

St abbrs#925
acabal merged 4 commits intostandardebooks:masterfrom
vr8hub:st_abbrs

Conversation

@vr8hub
Copy link
Contributor

@vr8hub vr8hub commented Feb 7, 2026

As discussed.

  1. Typogrify looks for St. abbreviations followed by whitespace, and if they're a) followed by a capital letter and b) not preceded by an ordinal, then it changes the whitespace to a no-break space.
  2. Semanticate looks for St. abbreviations followed by a no-break space and adds an abbr element with a z3998:name-title tag. The code later to add a plain abbr tag to all untagged St.'s is left unchanged, so any remaining St.'s that don't have abbrs will get one.

If typogrify tags a false positive and it happens to be caught and corrected (unlikely), then semanticate won't tag it with name-title. If it's not caught and semanticate tags the false positive, and that is subsequently manually corrected to remove the name-title tag (but not the `abbr tag) and change no-break space to a regular space, neither typogrify nor semanticate will mess with it again.

If semanticate is mistakenly run before typogrify, then all the St.'s get a plain tag and no trailing no-break space. The first is no worse than today, and the second is only partially worse. And it's easy to fix; just replace <abbr>St.</abbr> with St. and re-run typogrify/semanticate.

In a separate commit, I finally tracked down something that has been a problem for quite a while (years?). If there's some all uppercase text, and that text happens to have, e.g., compass directions in it, e.g. NEARER THE SEWER, then they get abbr elements and tags, even though they're in the middle of a word. (This happened while I was testing semanticate, so I both encountered the problem and was in the code, so I went ahead and investigated). It turns out that in three instances of looking for all caps abbreviations, the ending negative lookahead assertion was (?!\b), i.e. it can't be followed by a word break. That should clearly be (?!\B), i.e. it can't be followed by a anything BUT a word break. This matches the negative lookahead at the beginning of the regex, which is already (?<!\B).

There were three instances of that, and I changed them all, and added a test to the semanticate test file for it, along with tests for the St. abbrs.

@acabal acabal merged commit 76fbdcb into standardebooks:master Feb 13, 2026
1 check passed
@acabal
Copy link
Member

acabal commented Feb 13, 2026

Excellent work, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants