Skip to content

test(despeckle): add an op-level micro-benchmark for the page cleaner#37

Merged
P4suta merged 2 commits into
mainfrom
perf/despeckle-bench-2
Jun 10, 2026
Merged

test(despeckle): add an op-level micro-benchmark for the page cleaner#37
P4suta merged 2 commits into
mainfrom
perf/despeckle-bench-2

Conversation

@P4suta

@P4suta P4suta commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Stacked on #30. First PR of the despeckle-algorithm optimization round (despeckle = 71.6% of conv per pipeline/docs/perf-baseline.md).

Why

Nothing measured where inside clean() the ~50ms/page/core goes — and the optimization candidates (DWA morphology, metrics-pass skipping, selection restructuring) target different ops. This adds the op-level measurement first, so every following claim is judged against a committed table.

What

CleanerBenchmark (test sources, mirroring PipelineBenchmark): times each Leptonica primitive the cleaner composes plus clean() end-to-end on a deterministic synthetic 600-dpi A5 page (3496×4961, glyph columns + dust + isolated blots + pin-holes so all three passes have real work), writing despeckle/docs/cleaner-baseline.md. Task: ./gradlew :despeckle:infrastructure:benchCleaner (-Preps=N).

The baseline (clean() = 174.9ms single-threaded; Σ row covers 92.5%)

op median calls share
dilate 43×43 37.5ms 1 21.5%
selectBySize k=6 (inverted) 22.2ms 2 25.3%
countConnComp (metrics-only) 11.8ms 2 13.5%
selectBySize (page) 15.2ms 2 17.4%
open 7×7 12.2ms 1 7.0%
write/read G4 + booleans ~7%

Notable: the inverted-page selectBySize costs 22.2ms vs 15.2ms on the normal page — the giant-background-component re-render penalty is now a measured fact (it motivates the planned De Morgan flip), and the two metrics-only counting passes are 13.5% of the page.

🤖 Generated with Claude Code

Re-file of #31 (same commit, cherry-picked onto main).

P4suta and others added 2 commits June 10, 2026 19:33
despeckle dominates pdfbook conversions (~72% of conv), but nothing
measured WHERE inside clean() the ~50ms/page/core goes. benchCleaner
times each Leptonica primitive the cleaner composes — read, the four
selectBySize shapes (incl. the inverted-page variant whose giant
background component is rendered back), the 43x43 dilate, the 7x7
open, the boolean ops, both counting passes, the G4 write — plus
clean() end-to-end, on a deterministic synthetic 600-dpi A5 page, and
writes the table to despeckle/docs/cleaner-baseline.md.

The committed baseline (174.9ms clean(), sigma row covers 92.5%):
dilate 43x43 = 21.5%, selectBySize on the inverted page x2 = 25.3%
(22.2ms vs 15.2ms on the normal page — the background-component
re-render penalty, measured), metrics-only countConnComp x2 = 13.5%,
page selectBySize x2 = 17.4%, open 7x7 = 7.0%. Every following
optimization is judged against this table.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@P4suta P4suta merged commit 25e7809 into main Jun 10, 2026
20 checks passed
@P4suta P4suta deleted the perf/despeckle-bench-2 branch June 10, 2026 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant