Skip to content

perf(despeckle): skip the metrics-only component-counting passes#38

Merged
P4suta merged 1 commit into
mainfrom
perf/despeckle-skip-metrics-2
Jun 10, 2026
Merged

perf(despeckle): skip the metrics-only component-counting passes#38
P4suta merged 1 commit into
mainfrom
perf/despeckle-skip-metrics-2

Conversation

@P4suta

@P4suta P4suta commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Stacked on #31 (uses its benchCleaner baseline for the A/B).

Why

clean() counted 8-connected components before and after every page — two full connected-component labelings, 13.5% of the page per the committed baseline — yet the counts feed nothing but the HTML report's charts and a summary log line. The pdfbook pipeline never sets a report dir, so its hot path paid for numbers nobody read.

What

  • ProcessResult: component counts become OptionalInt; a counted 4-int convenience constructor keeps every existing construction compiling unchanged; componentsRemoved() stays present-or-zero int (documented); new withoutComponentStats(...) factory, hasComponentStats(), blackPixelsRemoved().
  • ProcessOptions: 6th knob collectComponentStats (default true via the delegating 5-knob constructor — all existing callers unchanged) + withoutComponentStats() wither carried through withDpi.
  • LeptonicaPageCleaner: both counting passes guarded; the black-pixel counts still run unconditionally (they feed the 3% over-removal guard).
  • DespeckleService: counting is on exactly when reportDir != null — the report path (HTML charts, totals, the "N component(s) removed" line) is byte-identical. Without a report, the summary logs in always-measured black-pixel terms; PdfBatchService drops the would-always-be-0 component clause without reports.

Measured

metric before after
clean() single-threaded 161.8ms 139.5ms (−13.8%) — matches the baseline's countConnComp share
despeckle stage (-j8, 200p) 10.19s 9.57s (−6.1%) — partially absorbed by memory-bandwidth saturation
conv (-j8) 14.23s 13.61s (−4.4%)

The −4.4% conv alone sits just under the 5% gate; it's a true work reduction (not a tradeoff) and compounds with the DWA morphology PR that follows.

Verification

  • Full ./gradlew check green; zero edits to existing test files' assertions (the compat constructors carry them).
  • New tests: domain (factory/wither/flag carry), infrastructure (counting-off clean still cleans pixel-identically and the over-removal guard works), application (capturing fake pins reportDir↔flag in both directions).

🤖 Generated with Claude Code

Re-file of #32 (same commit, cherry-picked onto main).

clean() counted 8-connected components before and after every page —
two full connected-component labelings, 13.5% of the page's time per
the committed baseline — yet the counts feed nothing but the HTML
report's charts and the summary log line. The pdfbook pipeline never
sets a report dir, so its hot path paid for numbers nobody read.

ProcessResult's component counts become OptionalInt (a counted 4-int
convenience constructor keeps every existing construction compiling
unchanged; componentsRemoved() stays present-or-zero int, documented).
ProcessOptions grows collectComponentStats (default true via the
delegating 5-knob constructor) plus a withoutComponentStats() wither.
DespeckleService turns counting off exactly when no reportDir is set —
the report path is byte-identical, including its charts and totals.
The black-pixel counts still run unconditionally: they feed the 3%
over-removal guard. Summary logs switch to always-measured black-pixel
terms when counting was skipped.

Measured: clean() 161.8ms -> 139.5ms single-threaded (-13.8%, matching
the baseline's countConnComp share); pipeline at -j8 despeckle stage
10.19s -> 9.57s (-6.1%, the saving partially absorbed by memory-
bandwidth saturation), conv -4.4%. The benchmark gains a "clean()
without component stats" row pinning the pipeline-path number.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@P4suta P4suta merged commit 148f6ac into main Jun 10, 2026
20 checks passed
@P4suta P4suta deleted the perf/despeckle-skip-metrics-2 branch June 10, 2026 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant