feat: auto-download ONNX models from ModelScope#118
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an auto-download feature that allows the library to automatically fetch and cache OCR model files from ModelScope. It includes a new download module in the core library, a static registry of supported models with SHA-256 verification, and integration into the high-level builders. Review feedback suggests optimizing performance by avoiding redundant hashing of large files, improving network efficiency through ureq::Agent reuse, and preventing race conditions during concurrent downloads by using unique temporary filenames.
There was a problem hiding this comment.
Pull request overview
Adds an auto-download feature that lets high-level OCR/structure builders accept registered bare model filenames and automatically fetch (and cache) the corresponding files from ModelScope with SHA-256 verification via oar-ocr-core.
Changes:
- Introduces
oar_ocr_core::core::download(feature-gated) with a static registry, cache resolution rules, download + hash verification, and unit tests. - Wires model-path resolution into OCR and structure builders so bare filenames are resolved through the auto-download cache.
- Updates docs/README and adds an
auto_downloadexample demonstrating the feature.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/oarocr/structure.rs |
Resolves structure pipeline model/dict/tokenizer paths via auto-download before building. |
src/oarocr/ocr.rs |
Resolves required OCR model/dict paths via auto-download before building adapters. |
src/oarocr/builder_utils.rs |
Adds resolve_model_path and applies it to optional adapter construction. |
src/lib.rs |
Re-exports a download module when auto-download is enabled. |
README.md |
Documents the new auto-download feature and behavior. |
oar-ocr-core/src/core/mod.rs |
Feature-gates and exposes the new download module. |
oar-ocr-core/src/core/download/registry.rs |
Adds the ModelScope file registry plus registry validation tests. |
oar-ocr-core/src/core/download/mod.rs |
Implements cache resolution + download/verification logic and tests. |
oar-ocr-core/Cargo.toml |
Adds the auto-download feature and optional deps (ureq, sha2, dirs). |
Cargo.toml |
Plumbs the top-level auto-download feature through to oar-ocr-core. |
examples/auto_download.rs |
Adds a runnable example showing bare-name resolution and caching. |
docs/models.md |
Documents auto-download usage and path resolution rules. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces an auto-download feature that allows the library to automatically fetch OCR model files from ModelScope into a local cache, verified by SHA-256 hashes. It updates the high-level builders to resolve model paths transparently and adds comprehensive documentation and examples. Review feedback points out a breaking change in the ureq 3.0 API usage, suggests improving error reporting for HTTP status codes, and highlights potential platform-specific issues with atomic file renaming on Windows.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces an auto-download feature that allows the library to automatically fetch OCR model files from ModelScope into a local cache directory ($OAR_HOME). It includes a new download module in oar-ocr-core for handling registry lookups, SHA-256 verification, and atomic file replacement, along with updates to the high-level builders to resolve model paths transparently. Feedback was provided regarding the thread-safety of modifying environment variables in tests and the importance of explicit error handling for non-2xx HTTP status codes during model downloads.
No description provided.