Do not linktrail if following text is not [a-z]? by kristian-clausal · Pull Request #414 · tatuylonen/wikitextprocessor

kristian-clausal · 2026-03-05T06:12:07Z

See wiktectract issue #1604
tatuylonen/wiktextract#1604 https://en.wikipedia.org/wiki/Help:Wikitext#Blend_link

This should not be merged as is, because it will create problems in other extractors that might rely on different behavior.

In the best-case scenario, there might be two different camps: 1) Languages that use spaces that want to do linktrailing 2) Languages without spaces that can't do linktrailing

If this is the case, we might be able to get away with a kludge that checks whether the script of the last character in the link matches the script of the first character after the link.

See wiktectract issue #1604 tatuylonen/wiktextract#1604 https://en.wikipedia.org/wiki/Help:Wikitext#Blend_link This should not be merged as is, because it will create problems in other extractors that might rely on different behavior. In the best-case scenario, there might be two different camps: 1) Languages that use spaces that want to do linktrailing 2) Languages without spaces that can't do linktrailing If this is the case, we might be able to get away with a kludge that checks whether the script of the last character in the link matches the script of the first character after the link.

See wiktectract issue #1604 tatuylonen/wiktextract#1604 https://en.wikipedia.org/wiki/Help:Wikitext#Blend_link This adds a new attribute to Wtp that contains a `re.Pattern` object used for pattern-matching these kinds of suffixed links. Modify `Wtp.linktrailing_re` to change the behavior based on how the parsed Wikimedia project handles linktrailing. English uses `[a-z]+`. Our default implementation uses `\w+`, which should be fine most of the time. Languages without spaces seem to use the English `[a-z]+`, which seems to make sense. `[[englishword]]KANJI` wouldn't have the kanji characters be consumed, but `\w+` breaks this.

We have a `NAMESPACEE` field in `parserfns` (`{{{NAMESPACEE}}}`, it's unimplement) which pisses off the linter for some reason.

kristian-clausal force-pushed the linktrailing branch from ea35175 to ecb885e Compare March 5, 2026 06:22

kristian-clausal added 2 commits March 9, 2026 11:46

Remove - uses: crate-ci/typos/@v1 because of false positives

980bb47

We have a `NAMESPACEE` field in `parserfns` (`{{{NAMESPACEE}}}`, it's unimplement) which pisses off the linter for some reason.

kristian-clausal force-pushed the linktrailing branch from 2b9a20e to 980bb47 Compare March 9, 2026 10:02

kristian-clausal merged commit 9d9a410 into main Mar 9, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not linktrail if following text is not [a-z]?#414

Do not linktrail if following text is not [a-z]?#414
kristian-clausal merged 3 commits intomainfrom
linktrailing

kristian-clausal commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kristian-clausal commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant