Merged
Conversation
added 4 commits
February 7, 2026 15:51
…pace (added by typogrify)
Member
|
Excellent work, thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
As discussed.
abbrtag to all untagged St.'s is left unchanged, so any remaining St.'s that don't haveabbrs will get one.If typogrify tags a false positive and it happens to be caught and corrected (unlikely), then semanticate won't tag it with name-title. If it's not caught and semanticate tags the false positive, and that is subsequently manually corrected to remove the name-title tag (but not the `abbr tag) and change no-break space to a regular space, neither typogrify nor semanticate will mess with it again.
If semanticate is mistakenly run before typogrify, then all the St.'s get a plain tag and no trailing no-break space. The first is no worse than today, and the second is only partially worse. And it's easy to fix; just replace
<abbr>St.</abbr>withSt.and re-run typogrify/semanticate.In a separate commit, I finally tracked down something that has been a problem for quite a while (years?). If there's some all uppercase text, and that text happens to have, e.g., compass directions in it, e.g. NEARER THE SEWER, then they get
abbrelements and tags, even though they're in the middle of a word. (This happened while I was testing semanticate, so I both encountered the problem and was in the code, so I went ahead and investigated). It turns out that in three instances of looking for all caps abbreviations, the ending negative lookahead assertion was(?!\b), i.e. it can't be followed by a word break. That should clearly be(?!\B), i.e. it can't be followed by a anything BUT a word break. This matches the negative lookahead at the beginning of the regex, which is already(?<!\B).There were three instances of that, and I changed them all, and added a test to the semanticate test file for it, along with tests for the St. abbrs.