Conversation
Use shared Arc-backed image inputs across OCR batching and adapter boundaries, only converting back to owned images when model APIs require ownership. Optimize normalization by avoiding redundant RGB8 conversions, reusing single-image paths, and gating Rayon parallelism by batch output size. Also validate OCR batch-size options, centralize table cell bbox parsing, guard empty stitching metadata, wire VL download-binaries/CUDA features, and document the VL download-binaries feature.
There was a problem hiding this comment.
Pull request overview
This PR optimizes the OCR pipeline to reduce unnecessary image copies by sharing Arc-backed images across batching and adapter boundaries, only materializing owned RgbImages when required by model APIs. It also improves normalization throughput and robustness in a few OCR/table-related utilities while wiring/documenting feature flags for the VL crate.
Changes:
- Switch
ImageTaskInputand OCR batching/cropping flows to useArc<RgbImage>with aninto_owned_images()escape hatch for model APIs. - Optimize image normalization by avoiding redundant
to_rgb8()conversions and gating Rayon parallelism based on total output size. - Add batch-size validation guardrails, centralize table cell bbox parsing, guard empty stitching metadata, and wire/document
download-binaries/CUDA features foroar-ocr-vl.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/oarocr/table_analyzer.rs |
Centralizes table cell bbox parsing and updates tests accordingly. |
src/oarocr/stitching.rs |
Avoids panics by guarding empty stitching metadata before computing segment stats. |
src/oarocr/ocr.rs |
Uses Arc<RgbImage> through detection/cropping/recognition and validates batch sizes in the builder. |
oar-ocr-vl/README.md |
Documents the download-binaries feature for fetching ORT binaries during build. |
oar-ocr-vl/Cargo.toml |
Adds download-binaries passthrough and wires cuda to oar-ocr-core/cuda; adopts workspace rust-version. |
oar-ocr-derive/Cargo.toml |
Adopts workspace rust-version. |
oar-ocr-core/src/processors/normalization.rs |
Avoids redundant RGB8 conversion, reuses shared logic, and gates Rayon by output size with tests. |
oar-ocr-core/src/domain/tasks/validation.rs |
Generalizes image validation to accept Borrow<RgbImage> (supports Arc<RgbImage>). |
oar-ocr-core/src/domain/adapters/* |
Converts shared image inputs to owned images at adapter/model boundaries via into_owned_images(). |
oar-ocr-core/src/core/traits/task.rs |
Changes ImageTaskInput.images to Vec<Arc<RgbImage>> and adds into_owned_images() with tests. |
oar-ocr-core/Cargo.toml |
Adopts workspace rust-version. |
Cargo.toml |
Introduces workspace-level rust-version. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Code Review
This pull request optimizes image handling by using Arc<RgbImage> to minimize cloning across the pipeline and introduces batch size validation in the OAROCRBuilder. It also refactors the normalization processor to include parallelization thresholds and more efficient pixel iteration. The review feedback identifies several opportunities to prevent integer overflows during buffer size calculations and suggests further performance optimizations in the normalization hot loops, such as pre-calculating channel mappings and improving cache locality for CHW layouts.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request optimizes the OCR pipeline by transitioning to Arc<RgbImage> to minimize image cloning and introduces a rust-version requirement across the workspace. It refactors normalization logic with better parallelization heuristics and adds batch size validation to the OAROCRBuilder. Feedback suggests further performance improvements in the CHW normalization paths by consolidating multiple channel passes into a single iteration over pixels to reduce memory bandwidth usage.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request optimizes memory usage and performance by transitioning to Arc for shared image data, reducing redundant clones across the OCR pipeline. Key changes include refactored normalization logic with parallelization thresholds, batch size validation in the OAROCRBuilder, and safety checks in the result stitcher to prevent panics. Review feedback highlighted a typo in the rust-version specification (set to a future version 1.95) and suggested a loop optimization in the normalization processor to improve cache locality for CHW layouts.
Use shared Arc-backed image inputs across OCR batching and adapter boundaries, only converting back to owned images when model APIs require ownership. Optimize normalization by avoiding redundant RGB8 conversions, reusing single-image paths, and gating Rayon parallelism by batch output size.
Also validate OCR batch-size options, centralize table cell bbox parsing, guard empty stitching metadata, wire VL download-binaries/CUDA features, and document the VL download-binaries feature.