Skip to content

perf: reduce image copies in OCR pipeline#112

Merged
GreatV merged 4 commits into
mainfrom
perf
Apr 27, 2026
Merged

perf: reduce image copies in OCR pipeline#112
GreatV merged 4 commits into
mainfrom
perf

Conversation

@GreatV
Copy link
Copy Markdown
Owner

@GreatV GreatV commented Apr 27, 2026

Use shared Arc-backed image inputs across OCR batching and adapter boundaries, only converting back to owned images when model APIs require ownership. Optimize normalization by avoiding redundant RGB8 conversions, reusing single-image paths, and gating Rayon parallelism by batch output size.

Also validate OCR batch-size options, centralize table cell bbox parsing, guard empty stitching metadata, wire VL download-binaries/CUDA features, and document the VL download-binaries feature.

 Use shared Arc-backed image inputs across OCR batching and adapter boundaries, only converting back to owned images when model APIs require ownership. Optimize normalization by avoiding redundant RGB8 conversions, reusing single-image paths, and gating Rayon parallelism by batch output size.

Also validate OCR batch-size options, centralize table cell bbox parsing, guard empty stitching metadata, wire VL download-binaries/CUDA features, and document the VL download-binaries feature.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the OCR pipeline to reduce unnecessary image copies by sharing Arc-backed images across batching and adapter boundaries, only materializing owned RgbImages when required by model APIs. It also improves normalization throughput and robustness in a few OCR/table-related utilities while wiring/documenting feature flags for the VL crate.

Changes:

  • Switch ImageTaskInput and OCR batching/cropping flows to use Arc<RgbImage> with an into_owned_images() escape hatch for model APIs.
  • Optimize image normalization by avoiding redundant to_rgb8() conversions and gating Rayon parallelism based on total output size.
  • Add batch-size validation guardrails, centralize table cell bbox parsing, guard empty stitching metadata, and wire/document download-binaries/CUDA features for oar-ocr-vl.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/oarocr/table_analyzer.rs Centralizes table cell bbox parsing and updates tests accordingly.
src/oarocr/stitching.rs Avoids panics by guarding empty stitching metadata before computing segment stats.
src/oarocr/ocr.rs Uses Arc<RgbImage> through detection/cropping/recognition and validates batch sizes in the builder.
oar-ocr-vl/README.md Documents the download-binaries feature for fetching ORT binaries during build.
oar-ocr-vl/Cargo.toml Adds download-binaries passthrough and wires cuda to oar-ocr-core/cuda; adopts workspace rust-version.
oar-ocr-derive/Cargo.toml Adopts workspace rust-version.
oar-ocr-core/src/processors/normalization.rs Avoids redundant RGB8 conversion, reuses shared logic, and gates Rayon by output size with tests.
oar-ocr-core/src/domain/tasks/validation.rs Generalizes image validation to accept Borrow<RgbImage> (supports Arc<RgbImage>).
oar-ocr-core/src/domain/adapters/* Converts shared image inputs to owned images at adapter/model boundaries via into_owned_images().
oar-ocr-core/src/core/traits/task.rs Changes ImageTaskInput.images to Vec<Arc<RgbImage>> and adds into_owned_images() with tests.
oar-ocr-core/Cargo.toml Adopts workspace rust-version.
Cargo.toml Introduces workspace-level rust-version.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes image handling by using Arc<RgbImage> to minimize cloning across the pipeline and introduces batch size validation in the OAROCRBuilder. It also refactors the normalization processor to include parallelization thresholds and more efficient pixel iteration. The review feedback identifies several opportunities to prevent integer overflows during buffer size calculations and suggests further performance optimizations in the normalization hot loops, such as pre-calculating channel mappings and improving cache locality for CHW layouts.

Comment thread oar-ocr-core/src/processors/normalization.rs Outdated
Comment thread oar-ocr-core/src/processors/normalization.rs Outdated
Comment thread oar-ocr-core/src/processors/normalization.rs Outdated
Comment thread oar-ocr-core/src/processors/normalization.rs
Comment thread oar-ocr-core/src/processors/normalization.rs
Comment thread oar-ocr-core/src/processors/normalization.rs
Comment thread oar-ocr-core/src/processors/normalization.rs
@GreatV
Copy link
Copy Markdown
Owner Author

GreatV commented Apr 27, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the OCR pipeline by transitioning to Arc<RgbImage> to minimize image cloning and introduces a rust-version requirement across the workspace. It refactors normalization logic with better parallelization heuristics and adds batch size validation to the OAROCRBuilder. Feedback suggests further performance improvements in the CHW normalization paths by consolidating multiple channel passes into a single iteration over pixels to reduce memory bandwidth usage.

Comment thread oar-ocr-core/src/processors/normalization.rs Outdated
Comment thread oar-ocr-core/src/processors/normalization.rs Outdated
Comment thread oar-ocr-core/src/processors/normalization.rs Outdated
@GreatV
Copy link
Copy Markdown
Owner Author

GreatV commented Apr 27, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes memory usage and performance by transitioning to Arc for shared image data, reducing redundant clones across the OCR pipeline. Key changes include refactored normalization logic with parallelization thresholds, batch size validation in the OAROCRBuilder, and safety checks in the result stitcher to prevent panics. Review feedback highlighted a typo in the rust-version specification (set to a future version 1.95) and suggested a loop optimization in the normalization processor to improve cache locality for CHW layouts.

Comment thread Cargo.toml
Comment thread oar-ocr-core/src/processors/normalization.rs
@GreatV GreatV merged commit b3445e3 into main Apr 27, 2026
5 checks passed
@GreatV GreatV deleted the perf branch April 27, 2026 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants