Skip to content

Anchor built-in directory exclude patterns to path-segment boundary#1

Merged
mostafasoufi merged 1 commit into
veronalabs:mainfrom
navidkashani:fix/anchor-directory-exclude-patterns
Apr 17, 2026
Merged

Anchor built-in directory exclude patterns to path-segment boundary#1
mostafasoufi merged 1 commit into
veronalabs:mainfrom
navidkashani:fix/anchor-directory-exclude-patterns

Conversation

@navidkashani

Copy link
Copy Markdown
Member

Problem

Several of the directory patterns in Config::BUILT_IN_EXCLUDE_PATTERNS are unanchored, so they behave as substring matches rather than path-segment matches. The most impactful case is /ext\//i:

var_dump(preg_match('/ext\//i', 'Text/Foo.php'));  // int(1) — wrong

Because "ext/" is a substring of "Text/" (case-insensitive), FileCopier::shouldExclude() returns true for every file under a Text/ directory, and the scoper silently drops it from the output. I hit this while vendoring a package that ships a Text/ source directory — the classes disappeared from vendor-prefixed/ with no warning.

The same class of bug exists for several sibling patterns.

Affected patterns

Pattern (before) Example collision Pattern (after)
/\.github\//i rare (leading .) `/(?:^
/\.gitlab\//i rare (leading .) `/(?:^
/examples?\//i MyExamples/, bad-examples/ `/(?:^
/ext\//i Text/, Context/real hit `/(?:^
/php4\//i graphql4/ `/(?:^
/tests?\//i UnitTests/, apitest/ `/(?:^
/dev-bin\//i my-dev-bin/ `/(?:^

Fix

(?:^|\/) is a non-capturing alternation that anchors the match to path-start or immediately after a /. This is the correct semantic for a "directory segment" match against the relative paths that FileCopier::shouldExclude(string $relativePath) receives.

The \b-anchored siblings (/\bbin\//i, /\bdocs?\//i) are intentionally left alone — \b has subtly different semantics at non-word characters (e.g. it still matches my-bin/), and changing them would be a behavior tweak outside this PR's scope. The loose filename patterns (CHANGELOG, Dockerfile, phpcs.xml, …) are also left alone because the substring matching is intentional there — it catches CHANGELOG.md, Dockerfile.dev, phpcs.xml.dist, etc.

Tests

  • New tests/Unit/Config/BuiltInExcludePatternsTest.php with 28 data-provider rows covering each fixed pattern against canonical positives (e.g. ext/Foo.php, src/ext/Foo.php) and collision negatives (e.g. Text/Foo.php, Context/Foo.php, MyExamples/, graphql4/, UnitTests/, my-dev-bin/).
  • Pattern strings are verified to exist in BUILT_IN_EXCLUDE_PATTERNS via reflection, then run directly through preg_match — no coupling to FileCopier.
  • @dataProvider docblock + static provider method so the test runs cleanly on both PHPUnit 9.6 and 10.x.
  • The two ext/ / examples?/ literal assertions in ConfigTest::testDefaultValues were updated to the new anchored strings.

Full Unit + Integration suites pass locally on PHP 8.5 / PHPUnit 10.5.

Back-compat note

This is a tightening of the pattern. Paths that used to match now do not — but the only paths affected are the substring collisions above (Text/, MyExamples/, UnitTests/, graphql4/, my-dev-bin/, …), which were being dropped unintentionally. There is no change in the generated output code for packages that don't contain such directories; for packages that do, the scoped output simply includes files that should have been included all along. No public API change.

Related

Surfaced downstream while integrating a package whose source lives in a Text/ directory into a WordPress plugin via wp-scoper.

Splitting

If you'd prefer a smaller first PR (e.g. ext/ only, since that's the one with a confirmed real-world hit), happy to rebase down to a single-pattern change — just say the word.

Seven of the directory patterns in Config::BUILT_IN_EXCLUDE_PATTERNS are
unanchored, so they behave as substring matches rather than path-segment
matches. Most visibly, /ext\//i matches Text/Foo.php because "ext/" is a
substring of "Text/" (case-insensitive), causing FileCopier::shouldExclude()
to return true and silently drop every file under a Text/ directory from
the scoped output.

Anchor the affected patterns with (?:^|\/) so they only match at
path-start or immediately after a /. Patterns fixed:

  .github/, .gitlab/, examples?/, ext/, php4/, tests?/, dev-bin/

The \b-anchored siblings (/\bbin\//i, /\bdocs?\//i) are intentionally
left alone — \b has subtly different semantics at non-word characters,
and tweaking them is outside this PR's scope. The loose filename patterns
(CHANGELOG, Dockerfile, phpcs.xml, …) are also left alone because the
substring matching is intentional there.

Add tests/Unit/Config/BuiltInExcludePatternsTest.php covering each fixed
pattern against canonical positives (ext/Foo.php, src/ext/Foo.php) and
the collision negatives (Text/Foo.php, Context/Foo.php, MyExamples/,
graphql4/, UnitTests/, my-dev-bin/, …). Pattern strings are read from
BUILT_IN_EXCLUDE_PATTERNS via reflection and run directly through
preg_match. Uses @dataProvider docblock + static provider so the test
runs cleanly on both PHPUnit 9.6 and 10.x.

Update the two ext/ / examples?/ literal assertions in ConfigTest to
the new anchored form.

This is a tightening of the pattern: only the substring collisions above
are affected, and they were being dropped unintentionally. No public
API change.
@navidkashani navidkashani force-pushed the fix/anchor-directory-exclude-patterns branch from 3a9bbd2 to 8af748e Compare April 17, 2026 09:18
navidkashani added a commit to eramhq/persian-kit that referenced this pull request Apr 17, 2026
Move the plugin's Persian utility surface (CharNormalizer,
DigitConverter, CardNumber, Iban, LegalId, NationalId, NumberFormatter,
NumberToWords, OrdinalNumber, PhoneNumber, Script, Slug, TimeAgo,
ValidationResult) out of src/Modules/Utilities/ and consume the
equivalent implementations from the eram/abzar library instead. The
library is vendored via wp-scoper under PersianKit\Dependencies so
upgrades no longer require synchronising drift between two copies.

Add pk_currency_format, pk_currency_convert, pk_words_to_number,
pk_validate_postal_code, pk_validate_plate_number, pk_validate_bill_id,
pk_half_space_fix, pk_keyboard_fix, and pk_persian_sort wrappers on top
of the new library so existing callers get the new capabilities without
touching the abzar namespace directly. The CharNormalization and
DigitConversion modules now delegate to the library rather than the
deleted local classes, and their tests are trimmed to cover the module
wiring only (library-level behaviour is covered upstream).

Wire eram/abzar into composer.json (require + wp-scoper packages) and
bump the plugin version to 1.0.0-beta.1 — this is an unannounced API
that will churn until 1.0.0. Add scripts/patch-wp-scoper.sh as a
transient post-install hook that anchors veronalabs/wp-scoper's
unanchored /ext\//i exclude pattern so abzar's Text/ directory is not
silently dropped during scoping; the script will be removed once the
upstream fix (veronalabs/wp-scoper#1) ships in a tagged release.
@mostafasoufi mostafasoufi merged commit 7a33a2b into veronalabs:main Apr 17, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants