Skip to content

perf(imaging): route morphology through Leptonica's DWA kernels, exactly#39

Merged
P4suta merged 1 commit into
mainfrom
perf/despeckle-dwa-morphology-2
Jun 10, 2026
Merged

perf(imaging): route morphology through Leptonica's DWA kernels, exactly#39
P4suta merged 1 commit into
mainfrom
perf/despeckle-dwa-morphology-2

Conversation

@P4suta

@P4suta P4suta commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Stacked on #32. The main course of the despeckle round.

Why

The cleaner's two heaviest morphology ops ran on Leptonica's generic rasterop bricks: the 43×43 isolated-dust dilation (37.5ms/page — the single hottest op in the committed baseline) and the 7×7 fill-holes opening (12.2ms). Leptonica ships word-accelerated DWA kernels for both; they were unbound.

The trap the equality sweep caught

Binding them surfaced exactly what the planned pixel-identity gate existed for: pixDilateBrickDwa silently diverges from the generic brick for sel sizes missing from the generated table — empirically every prime above 15, including the production 43 — while pixOpenBrickDwa is exact at every size and dilate is exact up to 15. (The composite pix*CompBrickDwa variants diverge too; diagnosed per-size in-container, table in the test rationale.)

The shipped routing (exact by construction)

  • dilated(): single DWA pass ≤ 15; larger sizes composed from safe-size DWA passes — exact by Minkowski sum (brick(a) ⊕ brick(b) = brick(a+b−1); per-pass clipping changes nothing inside the image rectangle because L∞ paths between in-bounds points stay in bounds), and version-robust (only the always-present small sels are used). 43×43 = three passes.
  • opened(): DWA ≤ 15 (production is 7×7), generic beyond (an opening does not compose).
  • Generic variants stay as package-private oracles; PixTest pins pixel-identity across radii 0–31 (sel 1–63) on border-touching ink + degenerate (tiny/all-black/all-white) pages.
  • Rider: pixCountPixels reuses one process-lifetime popcount table (hygiene, <0.1%).

Measured

metric before after
dilate 43×43 37.5ms 14.1ms (−62%)
open 7×7 12.2ms 4.0ms (−67%)
clean() pipeline path 139.5ms 107.6ms (−38.5% cumulative vs the #31 baseline)
despeckle stage (-j8) 9.57s 5.47s (−43%) — bandwidth relief compounds across workers
conv (-j8) 13.61s 9.51s (−30%; −34% vs the original #28 baseline)

selectBySize is now ~70% of the remaining clean() — the measured gate for the follow-up selection restructuring (De Morgan flip + single-labeling fusion) is met.

Verification

  • Full ./gradlew check green; ArchUnit FFM confinement untouched (new bindings in Leptonica, routing in Pix).
  • The golden cleaner tests pass unchanged (they exercise both DWA paths).
  • Rollback = revert the dilated/opened internals (~10 lines); bindings and sweep stay harmless.

🤖 Generated with Claude Code

Re-file of #33 (same commit, cherry-picked onto main).

The cleaner's two heaviest morphology ops ran on the generic rasterop
bricks: the 43x43 isolated-dust dilation (37.5ms/page, the single
hottest op in the baseline) and the 7x7 fill-holes opening (12.2ms).
Leptonica ships word-accelerated DWA kernels for both, unbound.

Binding them surfaced a real trap the planned equality sweep was built
to catch: pixDilateBrickDwa silently diverges from the generic brick
for sel sizes missing from the generated table (every prime above 15 —
including the production 43), while pixOpenBrickDwa is exact at every
size and dilate is exact up to 15. The shipped routing is therefore:
- dilated(): a single DWA pass up to size 15, larger sizes composed
  from safe-size DWA passes — exact by Minkowski sum (brick(a) then
  brick(b) equals brick(a+b-1); clipping per pass changes nothing
  inside the image rectangle since L-inf paths between in-bounds
  points stay in bounds), and version-robust (only the always-present
  small sels are used).
- opened(): DWA up to size 15 (production is 7x7), generic beyond (an
  opening does not compose from smaller passes).
The generic variants stay as package-private oracles, and PixTest
pins pixel-identity across radii 0..31 (sel 1..63) on border-touching
ink plus degenerate pages. Rider: pixCountPixels now reuses one
process-lifetime popcount table instead of rebuilding it per call.

Measured: dilate 43x43 37.5 -> 14.1ms (-62%), open 7x7 12.2 -> 4.0ms
(-67%); clean() on the pipeline path 139.5 -> 107.6ms; despeckle stage
at -j8 9.57 -> 5.47s (-43%, the bandwidth relief compounds across
workers), conv 13.61 -> 9.51s (-30%; -34% vs the original baseline).
selectBySize is now ~70% of the remaining clean() — the measured gate
for the follow-up selection restructuring.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@P4suta P4suta merged commit 69b001f into main Jun 10, 2026
20 checks passed
@P4suta P4suta deleted the perf/despeckle-dwa-morphology-2 branch June 10, 2026 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant