Support hierarchical speculative decoding and align inference outputs by GreatV · Pull Request #116 · GreatV/oar-ocr

GreatV · 2026-05-14T15:09:19Z

This pull request introduces Hierarchical Speculative Decoding (HSD) as a CUDA-accelerated inference path for Vision-Language Models (VLMs) in the oar-ocr-vl crate, updates documentation and usage guides, and improves CI coverage for new features. It also bumps the workspace version to 0.7.0 and updates dependencies accordingly. The most important changes are:

Major Feature: Hierarchical Speculative Decoding (HSD)

Added a comprehensive docs/hsd.md explaining the HSD algorithm, configuration, and usage, including references to the original paper and implementation-specific details.
Updated README.md to introduce HSD, explain its acceleration benefits, and link to the new documentation.

Vision-Language Model (VLM) and Documentation Updates

Updated the VLM section in README.md and docs/usage.md to document support for HunyuanOCR, GLM-OCR, and HSD, and improved code samples to use batch-oriented APIs. [1] [2] [3] [4]
Changed the VLM usage examples to reflect batch API and clarified model download instructions. [1] [2]

Continuous Integration and Testing

Added a new hsd job to the GitHub Actions workflow that installs the CUDA toolkit and runs clippy and tests for the HSD feature, ensuring CI coverage for CUDA-dependent code paths.
Introduced a feature-matrix job to test all feature combinations for oar-ocr and oar-ocr-core in CI.
Added a rustdoc warnings check to fail CI on documentation warnings.

Version and Dependency Updates

Bumped workspace and crate versions to 0.7.0 and updated dependencies in Cargo.toml to match, including oar-ocr-core and oar-ocr-derive. [1] [2]
Added the fontdb dependency to Cargo.toml for improved font handling.

Miscellaneous

Minor improvements to documentation formatting and accuracy, including badge links and removal of outdated references. [1] [2] [3]

These changes collectively introduce a major new acceleration feature, improve documentation and usability, and ensure robust CI coverage for new and existing features.

Copilot

Pull request overview

This PR adds a new CUDA-accelerated Hierarchical Speculative Decoding (HSD) inference path for vision-language document parsing in oar-ocr-vl, aligns/bridges draft vs target output formats (notably tables), and improves observability + CI coverage. It also updates structure/formula handling in the core OCR pipeline and removes deprecated VLM backends (UniRec, LightOnOCR) from the VLM crate.

Changes:

Introduce the hsd feature in oar-ocr-vl with new core modules (draft types, matching, prefix-tree batching, trimmable KV cache, backend utilities) and integrate KV trimming/gather support into multiple VLM backbones.
Improve structure/formula/OCR batching and diagnostics (formula ORT config override, formula batch sizing, timing logs), plus table HTML↔OTSL conversion utilities + tests.
Expand CI with rustdoc warnings-as-errors, feature-matrix checks, and a CUDA-toolkit-backed hsd job; update docs/READMEs and workspace versioning.

Reviewed changes

Copilot reviewed 74 out of 75 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/oarocr/structure.rs	Adds formula-only ORT session override, formula batching, and per-stage timing logs in structure pipeline.
README.md	Updates crates badge link and adds HSD overview + docs link.
oar-ocr-vl/src/utils/table.rs	Adds HTML→OTSL conversion for PaddleOCR-VL table drafts plus unit tests.
oar-ocr-vl/src/utils/image.rs	Adds safety documentation around `UnsafeSlice` Send/Sync and parallel write contract.
oar-ocr-vl/src/utils.rs	Makes Metal device parsing/macOS gating explicit and improves error messaging/tests cfgs.
oar-ocr-vl/src/unirec/model.rs	Removes UniRec implementation (deleted).
oar-ocr-vl/src/unirec/mod.rs	Removes UniRec module exports (deleted).
oar-ocr-vl/src/unirec/decoder.rs	Removes UniRec decoder implementation (deleted).
oar-ocr-vl/src/unirec/config.rs	Removes UniRec config implementation (deleted).
oar-ocr-vl/src/paddleocr_vl/vision.rs	Adds chunked attention path for large seq lengths to avoid OOM while keeping small seq byte-stability.
oar-ocr-vl/src/paddleocr_vl/ernie.rs	Switches to `TrimmableKvCache` and adds HSD-only cache length/keep-index hooks.
oar-ocr-vl/src/mineru/text.rs	Switches to `TrimmableKvCache` and adds HSD-only cache length/keep-index hooks.
oar-ocr-vl/src/lightonocr/text.rs	Removes LightOnOCR text model (deleted).
oar-ocr-vl/src/lightonocr/processing.rs	Removes LightOnOCR image preprocessing (deleted).
oar-ocr-vl/src/lightonocr/mod.rs	Removes LightOnOCR module exports (deleted).
oar-ocr-vl/src/lightonocr/config.rs	Removes LightOnOCR config definitions (deleted).
oar-ocr-vl/src/lib.rs	Adds `hsd` module/feature wiring, root `kv_trim` module, and adjusts crate docs/exports accordingly.
oar-ocr-vl/src/hunyuanocr/processing.rs	Aligns resample behavior with upstream (ignores config resample) for output parity.
oar-ocr-vl/src/hunyuanocr/mod.rs	Re-exports HSD prompt helpers behind `hsd` feature.
oar-ocr-vl/src/hunyuanocr/llm.rs	Switches to `TrimmableKvCache`, aligns XDRoPE numerics (F32 mix then cast back), adds optional one-shot activation dumps, and adds HSD-only KV gather hooks.
oar-ocr-vl/src/hsd/types.rs	Introduces shared HSD/DSV configs, drafts, region kinds, and detailed timing/acceptance stats with tests.
oar-ocr-vl/src/hsd/prefix_tree.rs	Adds prefix-tree builder for candidate suffix batching with tests.
oar-ocr-vl/src/hsd/mod.rs	Adds HSD module entrypoint and re-exports core types + KV cache wrapper.
oar-ocr-vl/src/hsd/matching.rs	Adds sliding-window candidate collection + capping heuristics with extensive tests.
oar-ocr-vl/src/hsd/kv_trim.rs	Adds a trim/gather-capable KV cache wrapper with unit tests.
oar-ocr-vl/src/hsd/backend_util.rs	Adds shared helper utilities for pos-id and KV keep-index construction with tests.
oar-ocr-vl/src/glmocr/text.rs	Switches to `TrimmableKvCache` and adds HSD-only cache length/keep-index hooks.
oar-ocr-vl/src/doc_parser.rs	Removes UniRec/LightOnOCR backends from DocParser integration and updates docs/comments accordingly.
oar-ocr-vl/README.md	Updates supported models list, clarifies DocParser scope, adds HSD docs + updated batch API examples, and removes UniRec docs.
oar-ocr-vl/examples/utils/structure_match.rs	Adds source-aware structure-result matching utility + tests for OmniDocBench drafting.
oar-ocr-vl/examples/utils/mod.rs	Adds HSD example helpers (config helpers, auto-tune, stats printing) and exports structure_match.
oar-ocr-vl/examples/unirec.rs	Removes UniRec example (deleted).
oar-ocr-vl/examples/mineru.rs	Updates MinerU example docs to use `models/...` paths.
oar-ocr-vl/examples/hunyuanocr.rs	Updates HunyuanOCR example docs to use `models/...` paths.
oar-ocr-vl/examples/glmocr.rs	Updates GLM-OCR example docs to use `models/...` paths.
oar-ocr-vl/examples/doc_parser.rs	Restricts DocParser example to layout-first backends and simplifies layout handling.
oar-ocr-vl/Cargo.toml	Adds `hsd` feature (implies cuda) and adds a dev-dependency on `oar-ocr`.
oar-ocr-vl/build.rs	Adds build-time guard to prevent enabling `metal` feature on non-macOS targets.
oar-ocr-core/src/utils/dict.rs	Adjusts docs to use relative paths and improves missing-file test portability.
oar-ocr-core/src/processors/layout_sorting.rs	Adds unit tests for enhanced layout sorting and child-block association behavior.
oar-ocr-core/src/processors/formula_preprocess.rs	Improves static regex construction and fixes LaTeX thin-space handling in normalization loop.
oar-ocr-core/src/processors/decode.rs	Switches static regex init to `expect` for clearer panic context.
oar-ocr-core/src/predictors/table_structure_recognition.rs	Refines model-family detection/mapping and documents wired vs wireless sizing defaults.
oar-ocr-core/src/predictors/formula_recognition.rs	Adds default batch_size to formula predictor config.
oar-ocr-core/src/models/recognition/unimernet.rs	Adds vocab sentinel filtering and tests to avoid decoding invalid/padded token ids.
oar-ocr-core/src/models/recognition/pp_formulanet.rs	Adds robust output selection for token-id tensor, vocab sentinel filtering, and tests.
oar-ocr-core/src/domain/tasks/layout_detection.rs	Clarifies score threshold docs and updates source/reference notes.
oar-ocr-core/src/domain/tasks/formula_recognition.rs	Adds `batch_size` to task config with validation + defaults.
oar-ocr-core/src/domain/adapters/preprocessing.rs	Changes doc code fences to non-runnable snippets to avoid doctest/rustdoc issues.
oar-ocr-core/src/domain/adapters/formula_recognition_adapter.rs	Adds timing/debug diagnostics, vocab-size-aware filtering, and batch_size plumbing into builder + adapter.
oar-ocr-core/src/core/macros.rs	Changes macro docs code fences to `text` to avoid doctest/rustdoc issues.
oar-ocr-core/src/core/inference/tensor_output.rs	Adds `dtype_name()` helper for logging/diagnostics.
oar-ocr-core/src/core/inference/mod.rs	Adds `OrtInfer::output_shapes()` helper for adapters needing output introspection.
oar-ocr-core/src/core/errors/mod.rs	Adjusts module docs to avoid intra-doc-link warnings.
oar-ocr-core/Cargo.toml	Removes unused dependencies (`toml`, `html-escape`).
examples/utils/visualization.rs	Switches system font loading to `fontdb` for better cross-platform behavior.
examples/table_structure_recognition.rs	Updates dict-path examples to be relative paths.
examples/structure.rs	Adds CLI arg to control formula recognition batch size and passes it into config.
docs/hsd.md	Adds detailed HSD algorithm/config documentation and usage guidance.
Cargo.toml	Bumps workspace version to 0.7.0 and adds `fontdb` dev-dependency at workspace root.
.github/workflows/test.yml	Adds rustdoc warning checks, feature-matrix job, and CUDA-backed `hsd` clippy/tests job.

Comments suppressed due to low confidence (1)

oar-ocr-vl/Cargo.toml:67

oar-ocr is added as a path-only dev-dependency. This will typically break cargo publish/crates.io builds because path dependencies are not allowed in published manifests unless they also specify a version (Cargo can then rewrite the path dep to the registry version). Consider adding version = "0.7.0" here or moving oar-ocr into [workspace.dependencies] and using oar-ocr.workspace = true.

[dev-dependencies]
clap = { version = "4.5.42", features = ["derive"] }
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
criterion = { version = "0.8", features = ["html_reports"] }
hayro = "0.6"
oar-ocr = { path = "..", default-features = false }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

gemini-code-assist

Code Review

This pull request introduces Hierarchical Speculative Decoding (HSD) to the oar-ocr-vl crate, providing a training-free CUDA acceleration path for document parsing VLMs. Key changes include the addition of a TrimmableKvCache to support HSD's tree-verification requirements, new HSD-specific backends for supported models (HunyuanOCR, GLM-OCR, MinerU, PaddleOCR-VL), and utilities for converting between HTML and OTSL table formats. My feedback highlights potential performance bottlenecks in the repetition penalty implementation and KV-cache management, as well as concerns regarding the robustness of regex-based HTML parsing for table conversion.

GreatV · 2026-05-14T23:24:57Z

/gemini review

Copilot

Pull request overview

Copilot reviewed 74 out of 75 changed files in this pull request and generated 1 comment.

gemini-code-assist

Code Review

This pull request implements Hierarchical Speculative Decoding (HSD) for VLM acceleration, adds support for GLM-OCR and MinerU2.5, and includes significant numerical fixes for HunyuanOCR to achieve parity with upstream implementations. Additionally, it introduces batching for formula recognition, improves font loading via fontdb, and removes the UniRec and LightOnOCR backends. Feedback highlights opportunities to improve performance by caching environment variable lookups in vision encoder forward passes and reducing code duplication in the HunyuanOCR prefill logic.

GreatV · 2026-05-15T01:47:21Z

/gemini review

gemini-code-assist

Code Review

This pull request upgrades the project to version 0.7.0, introducing support for HunyuanOCR, GLM-OCR, and MinerU2.5 models while removing the UniRec implementation. A major addition is Hierarchical Speculative Decoding (HSD) for CUDA-accelerated inference, supported by architectural updates like a trimmable KV cache. The update also includes critical fixes for vision encoder drift and optimizations for large-image attention. Feedback from the review suggests several technical improvements: optimizing the HSD verification loop by performing argmax on-device to reduce host-transfer overhead, ensuring consistency between Stage 1 and Stage 2 draft surfaces, hoisting redundant RoPE operations out of transformer layers, and improving the robustness of the HTML table attribute parser.

Copilot

Pull request overview

Copilot reviewed 74 out of 75 changed files in this pull request and generated no new comments.

GreatV · 2026-05-15T02:55:44Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces Hierarchical Speculative Decoding (HSD) to improve document parsing performance by leveraging a lightweight drafter pipeline and a target VLM verification path. Key changes include the addition of a TrimmableKvCache to support HSD's tree-verification requirements, a new hsd module for draft-target matching and prefix-tree construction, and integration across multiple VLM backends (HunyuanOCR, GLM-OCR, MinerU, PaddleOCR-VL). I have identified several areas for improvement, including correcting a Rust version typo in Cargo.toml, optimizing GPU-to-CPU data transfers in the HSD verification driver, and refactoring unstable let_chains syntax to ensure broader compiler compatibility.

Copilot

Pull request overview

Copilot reviewed 74 out of 75 changed files in this pull request and generated 2 comments.

GreatV · 2026-05-15T03:34:42Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces Hierarchical Speculative Decoding (HSD), a training-free CUDA acceleration scheme for Vision-Language Model (VLM) backends including HunyuanOCR, GLM-OCR, and MinerU2.5. The implementation includes a new hsd module for prefix-tree batching and speculative verification, alongside a TrimmableKvCache to support rolling back unaccepted tokens. Significant improvements were made to HunyuanOCR to fix numerical drift through the use of gelu_erf and F32 attention in late layers, and PaddleOCR-VL now utilizes chunked vision attention to handle large images without exceeding VRAM. Review feedback highlights opportunities to improve performance by reducing redundant data transfers between device and host during logit processing, optimizing tensor selection in pp_formulanet.rs into a single pass, and ensuring robustness against zero-sized reshapes in the HSD driver.

Copilot

Pull request overview

Copilot reviewed 74 out of 75 changed files in this pull request and generated no new comments.

GreatV added 4 commits May 7, 2026 16:12

support hierarchical speculative decoding (HSD)

c8b8f18

fix: align OCR/VL inference with official reference outputs

d3ebcb5

feat(vl): add CUDA HSD decoding support

ab3a61b

test: harden HSD and regression coverage

d709f08

GreatV requested a review from Copilot May 14, 2026 15:09

Copilot started reviewing on behalf of GreatV May 14, 2026 15:09 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

Comment thread oar-ocr-vl/src/utils/table.rs

Comment thread oar-ocr-vl/src/utils/table.rs Outdated

Comment thread src/oarocr/structure.rs Outdated

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

Comment thread oar-ocr-vl/src/hunyuanocr/model.rs

Comment thread oar-ocr-vl/src/hsd/kv_trim.rs

Comment thread oar-ocr-vl/src/utils/table.rs

Comment thread oar-ocr-vl/src/hunyuanocr/model.rs

refactor: update HSD workflow and enhance UniMERNet CUDA configuration

8d1b687

GreatV requested a review from Copilot May 14, 2026 23:25

Copilot started reviewing on behalf of GreatV May 14, 2026 23:25 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

Comment thread oar-ocr-vl/src/paddleocr_vl/vision.rs

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

Comment thread oar-ocr-vl/src/hunyuanocr/vision.rs Outdated

Comment thread oar-ocr-vl/src/hunyuanocr/vision.rs Outdated

Comment thread oar-ocr-vl/src/hunyuanocr/model.rs

refactor(vl): remove HunyuanOCR debug env-var dump path

bf24261

GreatV requested a review from Copilot May 15, 2026 01:47

Copilot started reviewing on behalf of GreatV May 15, 2026 01:48 View session

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

Comment thread oar-ocr-vl/src/hsd/verify.rs Outdated

Comment thread oar-ocr-vl/src/hsd/drafting.rs

Comment thread oar-ocr-vl/src/hunyuanocr/llm.rs

Comment thread oar-ocr-vl/src/utils/table.rs Outdated

Copilot AI reviewed May 15, 2026

View reviewed changes

fix(vl): hoist XDRoPE prep out of layer loop + harden extract_span

33d8373

GreatV requested a review from Copilot May 15, 2026 02:55

Copilot started reviewing on behalf of GreatV May 15, 2026 02:56 View session

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

Comment thread Cargo.toml

Comment thread oar-ocr-vl/src/hsd/verify.rs

Comment thread oar-ocr-vl/src/hsd/drafting.rs

Comment thread oar-ocr-vl/src/hsd/kv_trim.rs

Comment thread oar-ocr-vl/src/hsd/verify.rs Outdated

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread oar-ocr-core/src/models/recognition/pp_formulanet.rs Outdated

Comment thread oar-ocr-vl/src/hsd/verify.rs

fix(vl): tighten HSD verify shape check and trim micro-allocs

4b71702

GreatV requested a review from Copilot May 15, 2026 03:34

Copilot started reviewing on behalf of GreatV May 15, 2026 03:35 View session

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

Comment thread oar-ocr-vl/src/hunyuanocr/model.rs

Comment thread oar-ocr-core/src/models/recognition/pp_formulanet.rs

Comment thread oar-ocr-vl/src/hsd/verify.rs

Copilot AI reviewed May 15, 2026

View reviewed changes

GreatV merged commit 985e77e into main May 15, 2026
11 checks passed

GreatV deleted the hsd branch May 15, 2026 04:36

Conversation

GreatV commented May 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GreatV commented May 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GreatV commented May 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

GreatV commented May 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

GreatV commented May 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees