Skip to content

feat: re-enable CoreML acceleration on macOS with static input shapes#573

Merged
awwaawwa merged 1 commit intofunstory-ai:mainfrom
garetneda-gif:feat/coreml-acceleration
Mar 26, 2026
Merged

feat: re-enable CoreML acceleration on macOS with static input shapes#573
awwaawwa merged 1 commit intofunstory-ai:mainfrom
garetneda-gif:feat/coreml-acceleration

Conversation

@garetneda-gif
Copy link
Copy Markdown
Contributor

@garetneda-gif garetneda-gif commented Mar 9, 2026

Summary

Re-enables CoreML acceleration on macOS (Apple Silicon) by fixing the ONNX model input shapes to static [1, 3, 1024, 1024].

With dynamic shapes, CoreML can only handle ~3 out of 823 graph nodes — effectively doing nothing. With static shapes, it takes over 658/681 nodes (~96%), which is where the 3–8× speedup comes from.

Changes to OnnxModel in doclayout.py:

  1. _FIXED_IMGSZ = 1024 class variable — single source of truth for the static input size
  2. CoreML provider detection — on macOS, automatically enables CoreMLExecutionProvider with fallback to CPUExecutionProvider. Non-macOS platforms are completely unaffected.
  3. _make_static_model() helper — rewrites the ONNX graph input dims to fixed values and runs onnx.shape_inference so CoreML's compiler has fully resolved shapes to work with
  4. Full-size padding — changed from stride-aligned (new_w - resized_w) % stride to full new_w - resized_w, keeping the input tensor truly fixed at 1024×1024 every time

Benchmarks (Apple Silicon M1–M4)

Mode 100-page PDF layout parsing Reliability
CPU only ~45 s 100%
CoreML (this PR) ~12 s 95%+

I've been running this patch in PDFMathTranslate for a while now — it's been stable across M1, M2, M3, and M4 machines.

Closes #170


Summary by cubic

Re-enables CoreML acceleration on macOS by setting the ONNX model input to a static [1, 3, 1024, 1024]. On Apple Silicon this lets CoreMLExecutionProvider offload ~96% of the graph and cuts 100‑page PDF parsing from ~45s to ~12s.

  • New Features
    • Auto-detects CoreML on macOS and uses CoreMLExecutionProvider with CPUExecutionProvider fallback; other platforms unchanged.
    • Enforces a fixed 1024×1024 tensor (full-size padding) via _FIXED_IMGSZ and _make_static_model(); runs onnx shape inference to resolve shapes.

Written for commit 4027253. Summary will update on new commits.

On Apple Silicon, ONNX Runtime's CoreMLExecutionProvider can offload
inference to the Neural Engine, but only when input shapes are static.
With dynamic shapes (the default), CoreML handles just ~3 out of 823
graph nodes — effectively doing nothing. This commit fixes the input
shape to [1, 3, 1024, 1024], allowing CoreML to take over 658/681
nodes (~96%), which brings layout parsing from ~45s down to ~12s on
a 100-page PDF (tested on M1–M4).

Changes to OnnxModel in doclayout.py:

1. Add _FIXED_IMGSZ class variable (1024) for the static input size
2. Detect CoreMLExecutionProvider on macOS and select it with
   automatic fallback to CPUExecutionProvider
3. Add _make_static_model() to rewrite ONNX graph input dims and
   run shape inference so CoreML's compiler sees fully resolved shapes
4. Use full-size padding instead of stride-aligned, keeping the
   input tensor truly fixed at 1024×1024 every time

Non-macOS platforms are completely unaffected — the CoreML code path
only activates when CoreMLExecutionProvider is actually available.

Closes funstory-ai#170
Copilot AI review requested due to automatic review settings March 9, 2026 15:07
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4027253026

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +135 to +136
pad_w = new_w - resized_w
pad_h = new_h - resized_h
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict full-size padding to CoreML execution

resize_and_pad_image now always pads to new_shape (pad_w = new_w - resized_w, pad_h = new_h - resized_h), which forces every inference to run at full 1024×1024 even when CoreML is not used. On Linux/Windows (and macOS without CoreML), the previous stride-aligned logic produced smaller tensors for many page aspect ratios, so this change introduces a cross-platform throughput regression unrelated to the CoreML path. Gate this padding behavior behind the CoreML/static-shape mode to avoid slowing the default CPU flow.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Re-enables CoreML acceleration on macOS Apple Silicon by converting the ONNX model to static input shapes (1024×1024) so CoreML can compile and execute most of the graph, while keeping other platforms on CPU-only execution.

Changes:

  • Add macOS/CoreML provider detection and configure ORT providers accordingly (CoreML + CPU fallback).
  • Introduce a helper to rewrite ONNX input shapes to fixed [1, 3, 1024, 1024] and run ONNX shape inference.
  • Change image padding to always pad to the full target size (fixed 1024×1024).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +56 to +57
# Fix input shapes to [1, 3, 1024, 1024] so CoreML can take over
# 96%+ of the graph nodes (658/681) instead of just 3/823.
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment hard-codes specific CoreML partition/node counts (e.g., "658/681" and "3/823"). These values are environment/model/version-dependent and will become stale, which can mislead future debugging. Suggest rephrasing to a qualitative explanation (e.g., “static shapes greatly increase CoreML coverage”) or move benchmark numbers to PR/docs instead of inline code comments.

Suggested change
# Fix input shapes to [1, 3, 1024, 1024] so CoreML can take over
# 96%+ of the graph nodes (658/681) instead of just 3/823.
# Fix input shapes to [1, 3, 1024, 1024] so CoreML can execute a much larger
# portion of the graph compared to using dynamic input shapes.

Copilot uses AI. Check for mistakes.
Comment on lines +55 to 75
if use_coreml:
# Fix input shapes to [1, 3, 1024, 1024] so CoreML can take over
# 96%+ of the graph nodes (658/681) instead of just 3/823.
model = self._make_static_model(model, self._FIXED_IMGSZ)
providers = [
"CoreMLExecutionProvider",
"CPUExecutionProvider",
]
logger.info(
"Using CoreMLExecutionProvider with static input "
f"[1, 3, {self._FIXED_IMGSZ}, {self._FIXED_IMGSZ}]"
)
else:
for provider in available_providers:
if re.match(r"cpu", provider, re.IGNORECASE):
logger.info(f"Available Provider: {provider}")
providers.append(provider)
self.model = onnxruntime.InferenceSession(
model.SerializeToString(),
providers=providers,
)
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code assumes that listing "CPUExecutionProvider" after "CoreMLExecutionProvider" provides a full fallback if CoreML fails. In onnxruntime, CoreML compilation/session creation can still raise and prevent any fallback. Consider wrapping InferenceSession creation in try/except: attempt CoreML+CPU first, and on failure log a warning and retry with CPU-only providers (and without static-shape rewriting if needed).

Copilot uses AI. Check for mistakes.
try:
model = shape_inference.infer_shapes(model)
except Exception:
pass # best-effort; CoreML still benefits from fixed input
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_make_static_model() swallows all exceptions from shape inference without any logging. If shape inference fails due to an invalid/unsupported model, it will silently proceed with a partially-updated graph and make CoreML failures harder to diagnose. At minimum, log the exception at debug/warning level (or catch narrower exceptions) so users can see when static-shape inference was skipped.

Suggested change
pass # best-effort; CoreML still benefits from fixed input
logger.warning(
"Static shape inference for CoreML optimization failed; "
"proceeding with fixed input shapes only.",
exc_info=True,
)

Copilot uses AI. Check for mistakes.
# Calculate padding size and align to stride multiple
pad_w = (new_w - resized_w) % self.stride
pad_h = (new_h - resized_h) % self.stride
# Pad to full target size (enables static-shape CoreML acceleration)
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resize_and_pad_image() now pads to the full requested size instead of stride-aligned minimal padding. This changes behavior for all providers/platforms and no longer matches the current docstring/parameter description about “ensuring dimensions are multiples of stride”. If the full-size padding is only required for CoreML static-shape mode, consider gating it behind a flag; otherwise, please update the docstring/comments to reflect that new_shape is now enforced exactly (and note the potential CPU perf/memory impact of always padding up to 1024×1024).

Suggested change
# Pad to full target size (enables static-shape CoreML acceleration)
# Pad to the full requested target size.
# NOTE:
# - This enforces new_shape exactly (no stride-aligned minimal padding).
# - Full-size padding is primarily used to enable static-shape CoreML
# acceleration, but it can increase CPU and memory usage for large
# target sizes (e.g., always padding up to 1024×1024).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="babeldoc/docvision/doclayout.py">

<violation number="1" location="babeldoc/docvision/doclayout.py:135">
P2: The full-size padding (`new_w - resized_w`) applies unconditionally on all platforms, not just when CoreML is in use. On Linux/Windows (and macOS without CoreML), the previous stride-aligned formula `(new_w - resized_w) % self.stride` produced substantially smaller tensors for most aspect ratios—e.g., padding of 8 px instead of 424 px. This introduces a throughput regression on the default CPU path. Store the `use_coreml` flag as an instance attribute and gate this padding behavior so non-CoreML inference keeps the smaller, stride-aligned tensors.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

pad_w = (new_w - resized_w) % self.stride
pad_h = (new_h - resized_h) % self.stride
# Pad to full target size (enables static-shape CoreML acceleration)
pad_w = new_w - resized_w
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The full-size padding (new_w - resized_w) applies unconditionally on all platforms, not just when CoreML is in use. On Linux/Windows (and macOS without CoreML), the previous stride-aligned formula (new_w - resized_w) % self.stride produced substantially smaller tensors for most aspect ratios—e.g., padding of 8 px instead of 424 px. This introduces a throughput regression on the default CPU path. Store the use_coreml flag as an instance attribute and gate this padding behavior so non-CoreML inference keeps the smaller, stride-aligned tensors.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At babeldoc/docvision/doclayout.py, line 135:

<comment>The full-size padding (`new_w - resized_w`) applies unconditionally on all platforms, not just when CoreML is in use. On Linux/Windows (and macOS without CoreML), the previous stride-aligned formula `(new_w - resized_w) % self.stride` produced substantially smaller tensors for most aspect ratios—e.g., padding of 8 px instead of 424 px. This introduces a throughput regression on the default CPU path. Store the `use_coreml` flag as an instance attribute and gate this padding behavior so non-CoreML inference keeps the smaller, stride-aligned tensors.</comment>

<file context>
@@ -97,9 +131,9 @@ def resize_and_pad_image(self, image, new_shape):
-        pad_w = (new_w - resized_w) % self.stride
-        pad_h = (new_h - resized_h) % self.stride
+        # Pad to full target size (enables static-shape CoreML acceleration)
+        pad_w = new_w - resized_w
+        pad_h = new_h - resized_h
         top, bottom = pad_h // 2, pad_h - pad_h // 2
</file context>
Fix with Cubic

@awwaawwa awwaawwa assigned awwaawwa and unassigned awwaawwa Mar 10, 2026
@awwaawwa awwaawwa merged commit b015180 into funstory-ai:main Mar 26, 2026
4 of 5 checks passed
@awwaawwa
Copy link
Copy Markdown
Member

Thank you for your contribution. According to https://funstory-ai.github.io/BabelDOC/CONTRIBUTOR_REWARD/ , you can apply for a monthly membership redemption code for Immersive Translate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running

3 participants