feat: re-enable CoreML acceleration on macOS with static input shapes#573
Conversation
On Apple Silicon, ONNX Runtime's CoreMLExecutionProvider can offload inference to the Neural Engine, but only when input shapes are static. With dynamic shapes (the default), CoreML handles just ~3 out of 823 graph nodes — effectively doing nothing. This commit fixes the input shape to [1, 3, 1024, 1024], allowing CoreML to take over 658/681 nodes (~96%), which brings layout parsing from ~45s down to ~12s on a 100-page PDF (tested on M1–M4). Changes to OnnxModel in doclayout.py: 1. Add _FIXED_IMGSZ class variable (1024) for the static input size 2. Detect CoreMLExecutionProvider on macOS and select it with automatic fallback to CPUExecutionProvider 3. Add _make_static_model() to rewrite ONNX graph input dims and run shape inference so CoreML's compiler sees fully resolved shapes 4. Use full-size padding instead of stride-aligned, keeping the input tensor truly fixed at 1024×1024 every time Non-macOS platforms are completely unaffected — the CoreML code path only activates when CoreMLExecutionProvider is actually available. Closes funstory-ai#170
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4027253026
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pad_w = new_w - resized_w | ||
| pad_h = new_h - resized_h |
There was a problem hiding this comment.
Restrict full-size padding to CoreML execution
resize_and_pad_image now always pads to new_shape (pad_w = new_w - resized_w, pad_h = new_h - resized_h), which forces every inference to run at full 1024×1024 even when CoreML is not used. On Linux/Windows (and macOS without CoreML), the previous stride-aligned logic produced smaller tensors for many page aspect ratios, so this change introduces a cross-platform throughput regression unrelated to the CoreML path. Gate this padding behavior behind the CoreML/static-shape mode to avoid slowing the default CPU flow.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Re-enables CoreML acceleration on macOS Apple Silicon by converting the ONNX model to static input shapes (1024×1024) so CoreML can compile and execute most of the graph, while keeping other platforms on CPU-only execution.
Changes:
- Add macOS/CoreML provider detection and configure ORT providers accordingly (CoreML + CPU fallback).
- Introduce a helper to rewrite ONNX input shapes to fixed
[1, 3, 1024, 1024]and run ONNX shape inference. - Change image padding to always pad to the full target size (fixed 1024×1024).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Fix input shapes to [1, 3, 1024, 1024] so CoreML can take over | ||
| # 96%+ of the graph nodes (658/681) instead of just 3/823. |
There was a problem hiding this comment.
The comment hard-codes specific CoreML partition/node counts (e.g., "658/681" and "3/823"). These values are environment/model/version-dependent and will become stale, which can mislead future debugging. Suggest rephrasing to a qualitative explanation (e.g., “static shapes greatly increase CoreML coverage”) or move benchmark numbers to PR/docs instead of inline code comments.
| # Fix input shapes to [1, 3, 1024, 1024] so CoreML can take over | |
| # 96%+ of the graph nodes (658/681) instead of just 3/823. | |
| # Fix input shapes to [1, 3, 1024, 1024] so CoreML can execute a much larger | |
| # portion of the graph compared to using dynamic input shapes. |
| if use_coreml: | ||
| # Fix input shapes to [1, 3, 1024, 1024] so CoreML can take over | ||
| # 96%+ of the graph nodes (658/681) instead of just 3/823. | ||
| model = self._make_static_model(model, self._FIXED_IMGSZ) | ||
| providers = [ | ||
| "CoreMLExecutionProvider", | ||
| "CPUExecutionProvider", | ||
| ] | ||
| logger.info( | ||
| "Using CoreMLExecutionProvider with static input " | ||
| f"[1, 3, {self._FIXED_IMGSZ}, {self._FIXED_IMGSZ}]" | ||
| ) | ||
| else: | ||
| for provider in available_providers: | ||
| if re.match(r"cpu", provider, re.IGNORECASE): | ||
| logger.info(f"Available Provider: {provider}") | ||
| providers.append(provider) | ||
| self.model = onnxruntime.InferenceSession( | ||
| model.SerializeToString(), | ||
| providers=providers, | ||
| ) |
There was a problem hiding this comment.
The code assumes that listing "CPUExecutionProvider" after "CoreMLExecutionProvider" provides a full fallback if CoreML fails. In onnxruntime, CoreML compilation/session creation can still raise and prevent any fallback. Consider wrapping InferenceSession creation in try/except: attempt CoreML+CPU first, and on failure log a warning and retry with CPU-only providers (and without static-shape rewriting if needed).
| try: | ||
| model = shape_inference.infer_shapes(model) | ||
| except Exception: | ||
| pass # best-effort; CoreML still benefits from fixed input |
There was a problem hiding this comment.
_make_static_model() swallows all exceptions from shape inference without any logging. If shape inference fails due to an invalid/unsupported model, it will silently proceed with a partially-updated graph and make CoreML failures harder to diagnose. At minimum, log the exception at debug/warning level (or catch narrower exceptions) so users can see when static-shape inference was skipped.
| pass # best-effort; CoreML still benefits from fixed input | |
| logger.warning( | |
| "Static shape inference for CoreML optimization failed; " | |
| "proceeding with fixed input shapes only.", | |
| exc_info=True, | |
| ) |
| # Calculate padding size and align to stride multiple | ||
| pad_w = (new_w - resized_w) % self.stride | ||
| pad_h = (new_h - resized_h) % self.stride | ||
| # Pad to full target size (enables static-shape CoreML acceleration) |
There was a problem hiding this comment.
resize_and_pad_image() now pads to the full requested size instead of stride-aligned minimal padding. This changes behavior for all providers/platforms and no longer matches the current docstring/parameter description about “ensuring dimensions are multiples of stride”. If the full-size padding is only required for CoreML static-shape mode, consider gating it behind a flag; otherwise, please update the docstring/comments to reflect that new_shape is now enforced exactly (and note the potential CPU perf/memory impact of always padding up to 1024×1024).
| # Pad to full target size (enables static-shape CoreML acceleration) | |
| # Pad to the full requested target size. | |
| # NOTE: | |
| # - This enforces new_shape exactly (no stride-aligned minimal padding). | |
| # - Full-size padding is primarily used to enable static-shape CoreML | |
| # acceleration, but it can increase CPU and memory usage for large | |
| # target sizes (e.g., always padding up to 1024×1024). |
There was a problem hiding this comment.
1 issue found across 1 file
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="babeldoc/docvision/doclayout.py">
<violation number="1" location="babeldoc/docvision/doclayout.py:135">
P2: The full-size padding (`new_w - resized_w`) applies unconditionally on all platforms, not just when CoreML is in use. On Linux/Windows (and macOS without CoreML), the previous stride-aligned formula `(new_w - resized_w) % self.stride` produced substantially smaller tensors for most aspect ratios—e.g., padding of 8 px instead of 424 px. This introduces a throughput regression on the default CPU path. Store the `use_coreml` flag as an instance attribute and gate this padding behavior so non-CoreML inference keeps the smaller, stride-aligned tensors.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Add one-off context when rerunning by tagging
@cubic-dev-aiwith guidance or docs links (includingllms.txt) - Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| pad_w = (new_w - resized_w) % self.stride | ||
| pad_h = (new_h - resized_h) % self.stride | ||
| # Pad to full target size (enables static-shape CoreML acceleration) | ||
| pad_w = new_w - resized_w |
There was a problem hiding this comment.
P2: The full-size padding (new_w - resized_w) applies unconditionally on all platforms, not just when CoreML is in use. On Linux/Windows (and macOS without CoreML), the previous stride-aligned formula (new_w - resized_w) % self.stride produced substantially smaller tensors for most aspect ratios—e.g., padding of 8 px instead of 424 px. This introduces a throughput regression on the default CPU path. Store the use_coreml flag as an instance attribute and gate this padding behavior so non-CoreML inference keeps the smaller, stride-aligned tensors.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At babeldoc/docvision/doclayout.py, line 135:
<comment>The full-size padding (`new_w - resized_w`) applies unconditionally on all platforms, not just when CoreML is in use. On Linux/Windows (and macOS without CoreML), the previous stride-aligned formula `(new_w - resized_w) % self.stride` produced substantially smaller tensors for most aspect ratios—e.g., padding of 8 px instead of 424 px. This introduces a throughput regression on the default CPU path. Store the `use_coreml` flag as an instance attribute and gate this padding behavior so non-CoreML inference keeps the smaller, stride-aligned tensors.</comment>
<file context>
@@ -97,9 +131,9 @@ def resize_and_pad_image(self, image, new_shape):
- pad_w = (new_w - resized_w) % self.stride
- pad_h = (new_h - resized_h) % self.stride
+ # Pad to full target size (enables static-shape CoreML acceleration)
+ pad_w = new_w - resized_w
+ pad_h = new_h - resized_h
top, bottom = pad_h // 2, pad_h - pad_h // 2
</file context>
|
Thank you for your contribution. According to https://funstory-ai.github.io/BabelDOC/CONTRIBUTOR_REWARD/ , you can apply for a monthly membership redemption code for Immersive Translate. |
Summary
Re-enables CoreML acceleration on macOS (Apple Silicon) by fixing the ONNX model input shapes to static
[1, 3, 1024, 1024].With dynamic shapes, CoreML can only handle ~3 out of 823 graph nodes — effectively doing nothing. With static shapes, it takes over 658/681 nodes (~96%), which is where the 3–8× speedup comes from.
Changes to
OnnxModelindoclayout.py:_FIXED_IMGSZ = 1024class variable — single source of truth for the static input sizeCoreMLExecutionProviderwith fallback toCPUExecutionProvider. Non-macOS platforms are completely unaffected._make_static_model()helper — rewrites the ONNX graph input dims to fixed values and runsonnx.shape_inferenceso CoreML's compiler has fully resolved shapes to work with(new_w - resized_w) % strideto fullnew_w - resized_w, keeping the input tensor truly fixed at 1024×1024 every timeBenchmarks (Apple Silicon M1–M4)
I've been running this patch in PDFMathTranslate for a while now — it's been stable across M1, M2, M3, and M4 machines.
Closes #170
Summary by cubic
Re-enables CoreML acceleration on macOS by setting the ONNX model input to a static [1, 3, 1024, 1024]. On Apple Silicon this lets
CoreMLExecutionProvideroffload ~96% of the graph and cuts 100‑page PDF parsing from ~45s to ~12s.CoreMLExecutionProviderwithCPUExecutionProviderfallback; other platforms unchanged._FIXED_IMGSZand_make_static_model(); runsonnxshape inference to resolve shapes.Written for commit 4027253. Summary will update on new commits.