Skip to content

Support hierarchical speculative decoding and align inference outputs#116

Merged
GreatV merged 8 commits into
mainfrom
hsd
May 15, 2026
Merged

Support hierarchical speculative decoding and align inference outputs#116
GreatV merged 8 commits into
mainfrom
hsd

Conversation

@GreatV
Copy link
Copy Markdown
Owner

@GreatV GreatV commented May 14, 2026

This pull request introduces Hierarchical Speculative Decoding (HSD) as a CUDA-accelerated inference path for Vision-Language Models (VLMs) in the oar-ocr-vl crate, updates documentation and usage guides, and improves CI coverage for new features. It also bumps the workspace version to 0.7.0 and updates dependencies accordingly. The most important changes are:

Major Feature: Hierarchical Speculative Decoding (HSD)

  • Added a comprehensive docs/hsd.md explaining the HSD algorithm, configuration, and usage, including references to the original paper and implementation-specific details.
  • Updated README.md to introduce HSD, explain its acceleration benefits, and link to the new documentation.

Vision-Language Model (VLM) and Documentation Updates

  • Updated the VLM section in README.md and docs/usage.md to document support for HunyuanOCR, GLM-OCR, and HSD, and improved code samples to use batch-oriented APIs. [1] [2] [3] [4]
  • Changed the VLM usage examples to reflect batch API and clarified model download instructions. [1] [2]

Continuous Integration and Testing

  • Added a new hsd job to the GitHub Actions workflow that installs the CUDA toolkit and runs clippy and tests for the HSD feature, ensuring CI coverage for CUDA-dependent code paths.
  • Introduced a feature-matrix job to test all feature combinations for oar-ocr and oar-ocr-core in CI.
  • Added a rustdoc warnings check to fail CI on documentation warnings.

Version and Dependency Updates

  • Bumped workspace and crate versions to 0.7.0 and updated dependencies in Cargo.toml to match, including oar-ocr-core and oar-ocr-derive. [1] [2]
  • Added the fontdb dependency to Cargo.toml for improved font handling.

Miscellaneous

  • Minor improvements to documentation formatting and accuracy, including badge links and removal of outdated references. [1] [2] [3]

These changes collectively introduce a major new acceleration feature, improve documentation and usability, and ensure robust CI coverage for new and existing features.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new CUDA-accelerated Hierarchical Speculative Decoding (HSD) inference path for vision-language document parsing in oar-ocr-vl, aligns/bridges draft vs target output formats (notably tables), and improves observability + CI coverage. It also updates structure/formula handling in the core OCR pipeline and removes deprecated VLM backends (UniRec, LightOnOCR) from the VLM crate.

Changes:

  • Introduce the hsd feature in oar-ocr-vl with new core modules (draft types, matching, prefix-tree batching, trimmable KV cache, backend utilities) and integrate KV trimming/gather support into multiple VLM backbones.
  • Improve structure/formula/OCR batching and diagnostics (formula ORT config override, formula batch sizing, timing logs), plus table HTML↔OTSL conversion utilities + tests.
  • Expand CI with rustdoc warnings-as-errors, feature-matrix checks, and a CUDA-toolkit-backed hsd job; update docs/READMEs and workspace versioning.

Reviewed changes

Copilot reviewed 74 out of 75 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/oarocr/structure.rs Adds formula-only ORT session override, formula batching, and per-stage timing logs in structure pipeline.
README.md Updates crates badge link and adds HSD overview + docs link.
oar-ocr-vl/src/utils/table.rs Adds HTML→OTSL conversion for PaddleOCR-VL table drafts plus unit tests.
oar-ocr-vl/src/utils/image.rs Adds safety documentation around UnsafeSlice Send/Sync and parallel write contract.
oar-ocr-vl/src/utils.rs Makes Metal device parsing/macOS gating explicit and improves error messaging/tests cfgs.
oar-ocr-vl/src/unirec/model.rs Removes UniRec implementation (deleted).
oar-ocr-vl/src/unirec/mod.rs Removes UniRec module exports (deleted).
oar-ocr-vl/src/unirec/decoder.rs Removes UniRec decoder implementation (deleted).
oar-ocr-vl/src/unirec/config.rs Removes UniRec config implementation (deleted).
oar-ocr-vl/src/paddleocr_vl/vision.rs Adds chunked attention path for large seq lengths to avoid OOM while keeping small seq byte-stability.
oar-ocr-vl/src/paddleocr_vl/ernie.rs Switches to TrimmableKvCache and adds HSD-only cache length/keep-index hooks.
oar-ocr-vl/src/mineru/text.rs Switches to TrimmableKvCache and adds HSD-only cache length/keep-index hooks.
oar-ocr-vl/src/lightonocr/text.rs Removes LightOnOCR text model (deleted).
oar-ocr-vl/src/lightonocr/processing.rs Removes LightOnOCR image preprocessing (deleted).
oar-ocr-vl/src/lightonocr/mod.rs Removes LightOnOCR module exports (deleted).
oar-ocr-vl/src/lightonocr/config.rs Removes LightOnOCR config definitions (deleted).
oar-ocr-vl/src/lib.rs Adds hsd module/feature wiring, root kv_trim module, and adjusts crate docs/exports accordingly.
oar-ocr-vl/src/hunyuanocr/processing.rs Aligns resample behavior with upstream (ignores config resample) for output parity.
oar-ocr-vl/src/hunyuanocr/mod.rs Re-exports HSD prompt helpers behind hsd feature.
oar-ocr-vl/src/hunyuanocr/llm.rs Switches to TrimmableKvCache, aligns XDRoPE numerics (F32 mix then cast back), adds optional one-shot activation dumps, and adds HSD-only KV gather hooks.
oar-ocr-vl/src/hsd/types.rs Introduces shared HSD/DSV configs, drafts, region kinds, and detailed timing/acceptance stats with tests.
oar-ocr-vl/src/hsd/prefix_tree.rs Adds prefix-tree builder for candidate suffix batching with tests.
oar-ocr-vl/src/hsd/mod.rs Adds HSD module entrypoint and re-exports core types + KV cache wrapper.
oar-ocr-vl/src/hsd/matching.rs Adds sliding-window candidate collection + capping heuristics with extensive tests.
oar-ocr-vl/src/hsd/kv_trim.rs Adds a trim/gather-capable KV cache wrapper with unit tests.
oar-ocr-vl/src/hsd/backend_util.rs Adds shared helper utilities for pos-id and KV keep-index construction with tests.
oar-ocr-vl/src/glmocr/text.rs Switches to TrimmableKvCache and adds HSD-only cache length/keep-index hooks.
oar-ocr-vl/src/doc_parser.rs Removes UniRec/LightOnOCR backends from DocParser integration and updates docs/comments accordingly.
oar-ocr-vl/README.md Updates supported models list, clarifies DocParser scope, adds HSD docs + updated batch API examples, and removes UniRec docs.
oar-ocr-vl/examples/utils/structure_match.rs Adds source-aware structure-result matching utility + tests for OmniDocBench drafting.
oar-ocr-vl/examples/utils/mod.rs Adds HSD example helpers (config helpers, auto-tune, stats printing) and exports structure_match.
oar-ocr-vl/examples/unirec.rs Removes UniRec example (deleted).
oar-ocr-vl/examples/mineru.rs Updates MinerU example docs to use models/... paths.
oar-ocr-vl/examples/hunyuanocr.rs Updates HunyuanOCR example docs to use models/... paths.
oar-ocr-vl/examples/glmocr.rs Updates GLM-OCR example docs to use models/... paths.
oar-ocr-vl/examples/doc_parser.rs Restricts DocParser example to layout-first backends and simplifies layout handling.
oar-ocr-vl/Cargo.toml Adds hsd feature (implies cuda) and adds a dev-dependency on oar-ocr.
oar-ocr-vl/build.rs Adds build-time guard to prevent enabling metal feature on non-macOS targets.
oar-ocr-core/src/utils/dict.rs Adjusts docs to use relative paths and improves missing-file test portability.
oar-ocr-core/src/processors/layout_sorting.rs Adds unit tests for enhanced layout sorting and child-block association behavior.
oar-ocr-core/src/processors/formula_preprocess.rs Improves static regex construction and fixes LaTeX thin-space handling in normalization loop.
oar-ocr-core/src/processors/decode.rs Switches static regex init to expect for clearer panic context.
oar-ocr-core/src/predictors/table_structure_recognition.rs Refines model-family detection/mapping and documents wired vs wireless sizing defaults.
oar-ocr-core/src/predictors/formula_recognition.rs Adds default batch_size to formula predictor config.
oar-ocr-core/src/models/recognition/unimernet.rs Adds vocab sentinel filtering and tests to avoid decoding invalid/padded token ids.
oar-ocr-core/src/models/recognition/pp_formulanet.rs Adds robust output selection for token-id tensor, vocab sentinel filtering, and tests.
oar-ocr-core/src/domain/tasks/layout_detection.rs Clarifies score threshold docs and updates source/reference notes.
oar-ocr-core/src/domain/tasks/formula_recognition.rs Adds batch_size to task config with validation + defaults.
oar-ocr-core/src/domain/adapters/preprocessing.rs Changes doc code fences to non-runnable snippets to avoid doctest/rustdoc issues.
oar-ocr-core/src/domain/adapters/formula_recognition_adapter.rs Adds timing/debug diagnostics, vocab-size-aware filtering, and batch_size plumbing into builder + adapter.
oar-ocr-core/src/core/macros.rs Changes macro docs code fences to text to avoid doctest/rustdoc issues.
oar-ocr-core/src/core/inference/tensor_output.rs Adds dtype_name() helper for logging/diagnostics.
oar-ocr-core/src/core/inference/mod.rs Adds OrtInfer::output_shapes() helper for adapters needing output introspection.
oar-ocr-core/src/core/errors/mod.rs Adjusts module docs to avoid intra-doc-link warnings.
oar-ocr-core/Cargo.toml Removes unused dependencies (toml, html-escape).
examples/utils/visualization.rs Switches system font loading to fontdb for better cross-platform behavior.
examples/table_structure_recognition.rs Updates dict-path examples to be relative paths.
examples/structure.rs Adds CLI arg to control formula recognition batch size and passes it into config.
docs/hsd.md Adds detailed HSD algorithm/config documentation and usage guidance.
Cargo.toml Bumps workspace version to 0.7.0 and adds fontdb dev-dependency at workspace root.
.github/workflows/test.yml Adds rustdoc warning checks, feature-matrix job, and CUDA-backed hsd clippy/tests job.
Comments suppressed due to low confidence (1)

oar-ocr-vl/Cargo.toml:67

  • oar-ocr is added as a path-only dev-dependency. This will typically break cargo publish/crates.io builds because path dependencies are not allowed in published manifests unless they also specify a version (Cargo can then rewrite the path dep to the registry version). Consider adding version = "0.7.0" here or moving oar-ocr into [workspace.dependencies] and using oar-ocr.workspace = true.
[dev-dependencies]
clap = { version = "4.5.42", features = ["derive"] }
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
criterion = { version = "0.8", features = ["html_reports"] }
hayro = "0.6"
oar-ocr = { path = "..", default-features = false }


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread oar-ocr-vl/src/utils/table.rs
Comment thread oar-ocr-vl/src/utils/table.rs Outdated
Comment thread src/oarocr/structure.rs Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Hierarchical Speculative Decoding (HSD) to the oar-ocr-vl crate, providing a training-free CUDA acceleration path for document parsing VLMs. Key changes include the addition of a TrimmableKvCache to support HSD's tree-verification requirements, new HSD-specific backends for supported models (HunyuanOCR, GLM-OCR, MinerU, PaddleOCR-VL), and utilities for converting between HTML and OTSL table formats. My feedback highlights potential performance bottlenecks in the repetition penalty implementation and KV-cache management, as well as concerns regarding the robustness of regex-based HTML parsing for table conversion.

Comment thread oar-ocr-vl/src/hunyuanocr/model.rs
Comment thread oar-ocr-vl/src/hsd/kv_trim.rs
Comment thread oar-ocr-vl/src/utils/table.rs
Comment thread oar-ocr-vl/src/hunyuanocr/model.rs
@GreatV
Copy link
Copy Markdown
Owner Author

GreatV commented May 14, 2026

/gemini review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 74 out of 75 changed files in this pull request and generated 1 comment.

Comment thread oar-ocr-vl/src/paddleocr_vl/vision.rs
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements Hierarchical Speculative Decoding (HSD) for VLM acceleration, adds support for GLM-OCR and MinerU2.5, and includes significant numerical fixes for HunyuanOCR to achieve parity with upstream implementations. Additionally, it introduces batching for formula recognition, improves font loading via fontdb, and removes the UniRec and LightOnOCR backends. Feedback highlights opportunities to improve performance by caching environment variable lookups in vision encoder forward passes and reducing code duplication in the HunyuanOCR prefill logic.

Comment thread oar-ocr-vl/src/hunyuanocr/vision.rs Outdated
Comment thread oar-ocr-vl/src/hunyuanocr/vision.rs Outdated
Comment thread oar-ocr-vl/src/hunyuanocr/model.rs
@GreatV
Copy link
Copy Markdown
Owner Author

GreatV commented May 15, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request upgrades the project to version 0.7.0, introducing support for HunyuanOCR, GLM-OCR, and MinerU2.5 models while removing the UniRec implementation. A major addition is Hierarchical Speculative Decoding (HSD) for CUDA-accelerated inference, supported by architectural updates like a trimmable KV cache. The update also includes critical fixes for vision encoder drift and optimizations for large-image attention. Feedback from the review suggests several technical improvements: optimizing the HSD verification loop by performing argmax on-device to reduce host-transfer overhead, ensuring consistency between Stage 1 and Stage 2 draft surfaces, hoisting redundant RoPE operations out of transformer layers, and improving the robustness of the HTML table attribute parser.

Comment thread oar-ocr-vl/src/hsd/verify.rs Outdated
Comment thread oar-ocr-vl/src/hsd/drafting.rs
Comment thread oar-ocr-vl/src/hunyuanocr/llm.rs
Comment thread oar-ocr-vl/src/utils/table.rs Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 74 out of 75 changed files in this pull request and generated no new comments.

@GreatV
Copy link
Copy Markdown
Owner Author

GreatV commented May 15, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Hierarchical Speculative Decoding (HSD) to improve document parsing performance by leveraging a lightweight drafter pipeline and a target VLM verification path. Key changes include the addition of a TrimmableKvCache to support HSD's tree-verification requirements, a new hsd module for draft-target matching and prefix-tree construction, and integration across multiple VLM backends (HunyuanOCR, GLM-OCR, MinerU, PaddleOCR-VL). I have identified several areas for improvement, including correcting a Rust version typo in Cargo.toml, optimizing GPU-to-CPU data transfers in the HSD verification driver, and refactoring unstable let_chains syntax to ensure broader compiler compatibility.

Comment thread Cargo.toml
Comment thread oar-ocr-vl/src/hsd/verify.rs
Comment thread oar-ocr-vl/src/hsd/drafting.rs
Comment thread oar-ocr-vl/src/hsd/kv_trim.rs
Comment thread oar-ocr-vl/src/hsd/verify.rs Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 74 out of 75 changed files in this pull request and generated 2 comments.

Comment thread oar-ocr-core/src/models/recognition/pp_formulanet.rs Outdated
Comment thread oar-ocr-vl/src/hsd/verify.rs
@GreatV
Copy link
Copy Markdown
Owner Author

GreatV commented May 15, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Hierarchical Speculative Decoding (HSD), a training-free CUDA acceleration scheme for Vision-Language Model (VLM) backends including HunyuanOCR, GLM-OCR, and MinerU2.5. The implementation includes a new hsd module for prefix-tree batching and speculative verification, alongside a TrimmableKvCache to support rolling back unaccepted tokens. Significant improvements were made to HunyuanOCR to fix numerical drift through the use of gelu_erf and F32 attention in late layers, and PaddleOCR-VL now utilizes chunked vision attention to handle large images without exceeding VRAM. Review feedback highlights opportunities to improve performance by reducing redundant data transfers between device and host during logit processing, optimizing tensor selection in pp_formulanet.rs into a single pass, and ensuring robustness against zero-sized reshapes in the HSD driver.

Comment thread oar-ocr-vl/src/hunyuanocr/model.rs
Comment thread oar-ocr-core/src/models/recognition/pp_formulanet.rs
Comment thread oar-ocr-vl/src/hsd/verify.rs
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 74 out of 75 changed files in this pull request and generated no new comments.

@GreatV GreatV merged commit 985e77e into main May 15, 2026
11 checks passed
@GreatV GreatV deleted the hsd branch May 15, 2026 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants