perf(imaging): route morphology through Leptonica's DWA kernels, exactly#39
Merged
Conversation
The cleaner's two heaviest morphology ops ran on the generic rasterop bricks: the 43x43 isolated-dust dilation (37.5ms/page, the single hottest op in the baseline) and the 7x7 fill-holes opening (12.2ms). Leptonica ships word-accelerated DWA kernels for both, unbound. Binding them surfaced a real trap the planned equality sweep was built to catch: pixDilateBrickDwa silently diverges from the generic brick for sel sizes missing from the generated table (every prime above 15 — including the production 43), while pixOpenBrickDwa is exact at every size and dilate is exact up to 15. The shipped routing is therefore: - dilated(): a single DWA pass up to size 15, larger sizes composed from safe-size DWA passes — exact by Minkowski sum (brick(a) then brick(b) equals brick(a+b-1); clipping per pass changes nothing inside the image rectangle since L-inf paths between in-bounds points stay in bounds), and version-robust (only the always-present small sels are used). - opened(): DWA up to size 15 (production is 7x7), generic beyond (an opening does not compose from smaller passes). The generic variants stay as package-private oracles, and PixTest pins pixel-identity across radii 0..31 (sel 1..63) on border-touching ink plus degenerate pages. Rider: pixCountPixels now reuses one process-lifetime popcount table instead of rebuilding it per call. Measured: dilate 43x43 37.5 -> 14.1ms (-62%), open 7x7 12.2 -> 4.0ms (-67%); clean() on the pipeline path 139.5 -> 107.6ms; despeckle stage at -j8 9.57 -> 5.47s (-43%, the bandwidth relief compounds across workers), conv 13.61 -> 9.51s (-30%; -34% vs the original baseline). selectBySize is now ~70% of the remaining clean() — the measured gate for the follow-up selection restructuring. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The cleaner's two heaviest morphology ops ran on Leptonica's generic rasterop bricks: the 43×43 isolated-dust dilation (37.5ms/page — the single hottest op in the committed baseline) and the 7×7 fill-holes opening (12.2ms). Leptonica ships word-accelerated DWA kernels for both; they were unbound.
The trap the equality sweep caught
Binding them surfaced exactly what the planned pixel-identity gate existed for:
pixDilateBrickDwasilently diverges from the generic brick for sel sizes missing from the generated table — empirically every prime above 15, including the production 43 — whilepixOpenBrickDwais exact at every size and dilate is exact up to 15. (The compositepix*CompBrickDwavariants diverge too; diagnosed per-size in-container, table in the test rationale.)The shipped routing (exact by construction)
dilated(): single DWA pass ≤ 15; larger sizes composed from safe-size DWA passes — exact by Minkowski sum (brick(a) ⊕ brick(b) = brick(a+b−1); per-pass clipping changes nothing inside the image rectangle because L∞ paths between in-bounds points stay in bounds), and version-robust (only the always-present small sels are used). 43×43 = three passes.opened(): DWA ≤ 15 (production is 7×7), generic beyond (an opening does not compose).PixTestpins pixel-identity across radii 0–31 (sel 1–63) on border-touching ink + degenerate (tiny/all-black/all-white) pages.pixCountPixelsreuses one process-lifetime popcount table (hygiene, <0.1%).Measured
selectBySizeis now ~70% of the remaining clean() — the measured gate for the follow-up selection restructuring (De Morgan flip + single-labeling fusion) is met.Verification
./gradlew checkgreen; ArchUnit FFM confinement untouched (new bindings inLeptonica, routing inPix).dilated/openedinternals (~10 lines); bindings and sweep stay harmless.🤖 Generated with Claude Code