feat: re-enable CoreML acceleration on macOS with static input shapes by garetneda-gif · Pull Request #573 · funstory-ai/BabelDOC

garetneda-gif · 2026-03-09T15:07:31Z

Summary

Re-enables CoreML acceleration on macOS (Apple Silicon) by fixing the ONNX model input shapes to static [1, 3, 1024, 1024].

With dynamic shapes, CoreML can only handle ~3 out of 823 graph nodes — effectively doing nothing. With static shapes, it takes over 658/681 nodes (~96%), which is where the 3–8× speedup comes from.

Changes to `OnnxModel` in `doclayout.py`:

_FIXED_IMGSZ = 1024 class variable — single source of truth for the static input size
CoreML provider detection — on macOS, automatically enables CoreMLExecutionProvider with fallback to CPUExecutionProvider. Non-macOS platforms are completely unaffected.
_make_static_model() helper — rewrites the ONNX graph input dims to fixed values and runs onnx.shape_inference so CoreML's compiler has fully resolved shapes to work with
Full-size padding — changed from stride-aligned (new_w - resized_w) % stride to full new_w - resized_w, keeping the input tensor truly fixed at 1024×1024 every time

Benchmarks (Apple Silicon M1–M4)

Mode	100-page PDF layout parsing	Reliability
CPU only	~45 s	100%
CoreML (this PR)	~12 s	95%+

I've been running this patch in PDFMathTranslate for a while now — it's been stable across M1, M2, M3, and M4 machines.

Closes #170

Summary by cubic

Re-enables CoreML acceleration on macOS by setting the ONNX model input to a static [1, 3, 1024, 1024]. On Apple Silicon this lets CoreMLExecutionProvider offload ~96% of the graph and cuts 100‑page PDF parsing from ~45s to ~12s.

New Features
- Auto-detects CoreML on macOS and uses CoreMLExecutionProvider with CPUExecutionProvider fallback; other platforms unchanged.
- Enforces a fixed 1024×1024 tensor (full-size padding) via _FIXED_IMGSZ and _make_static_model(); runs onnx shape inference to resolve shapes.

^{Written for commit 4027253. Summary will update on new commits.}

On Apple Silicon, ONNX Runtime's CoreMLExecutionProvider can offload inference to the Neural Engine, but only when input shapes are static. With dynamic shapes (the default), CoreML handles just ~3 out of 823 graph nodes — effectively doing nothing. This commit fixes the input shape to [1, 3, 1024, 1024], allowing CoreML to take over 658/681 nodes (~96%), which brings layout parsing from ~45s down to ~12s on a 100-page PDF (tested on M1–M4). Changes to OnnxModel in doclayout.py: 1. Add _FIXED_IMGSZ class variable (1024) for the static input size 2. Detect CoreMLExecutionProvider on macOS and select it with automatic fallback to CPUExecutionProvider 3. Add _make_static_model() to rewrite ONNX graph input dims and run shape inference so CoreML's compiler sees fully resolved shapes 4. Use full-size padding instead of stride-aligned, keeping the input tensor truly fixed at 1024×1024 every time Non-macOS platforms are completely unaffected — the CoreML code path only activates when CoreMLExecutionProvider is actually available. Closes funstory-ai#170

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4027253026

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-09T15:10:39Z

+        pad_w = new_w - resized_w
+        pad_h = new_h - resized_h


Restrict full-size padding to CoreML execution

resize_and_pad_image now always pads to new_shape (pad_w = new_w - resized_w, pad_h = new_h - resized_h), which forces every inference to run at full 1024×1024 even when CoreML is not used. On Linux/Windows (and macOS without CoreML), the previous stride-aligned logic produced smaller tensors for many page aspect ratios, so this change introduces a cross-platform throughput regression unrelated to the CoreML path. Gate this padding behavior behind the CoreML/static-shape mode to avoid slowing the default CPU flow.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Re-enables CoreML acceleration on macOS Apple Silicon by converting the ONNX model to static input shapes (1024×1024) so CoreML can compile and execute most of the graph, while keeping other platforms on CPU-only execution.

Changes:

Add macOS/CoreML provider detection and configure ORT providers accordingly (CoreML + CPU fallback).
Introduce a helper to rewrite ONNX input shapes to fixed [1, 3, 1024, 1024] and run ONNX shape inference.
Change image padding to always pad to the full target size (fixed 1024×1024).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-09T15:11:56Z

+            # Fix input shapes to [1, 3, 1024, 1024] so CoreML can take over
+            # 96%+ of the graph nodes (658/681) instead of just 3/823.


The comment hard-codes specific CoreML partition/node counts (e.g., "658/681" and "3/823"). These values are environment/model/version-dependent and will become stale, which can mislead future debugging. Suggest rephrasing to a qualitative explanation (e.g., “static shapes greatly increase CoreML coverage”) or move benchmark numbers to PR/docs instead of inline code comments.

Suggested change

# Fix input shapes to [1, 3, 1024, 1024] so CoreML can take over

# 96%+ of the graph nodes (658/681) instead of just 3/823.

# Fix input shapes to [1, 3, 1024, 1024] so CoreML can execute a much larger

# portion of the graph compared to using dynamic input shapes.

Copilot · 2026-03-09T15:11:56Z

+        if use_coreml:
+            # Fix input shapes to [1, 3, 1024, 1024] so CoreML can take over
+            # 96%+ of the graph nodes (658/681) instead of just 3/823.
+            model = self._make_static_model(model, self._FIXED_IMGSZ)
+            providers = [
+                "CoreMLExecutionProvider",
+                "CPUExecutionProvider",
+            ]
+            logger.info(
+                "Using CoreMLExecutionProvider with static input "
+                f"[1, 3, {self._FIXED_IMGSZ}, {self._FIXED_IMGSZ}]"
+            )
+        else:
+            for provider in available_providers:
+                if re.match(r"cpu", provider, re.IGNORECASE):
+                    logger.info(f"Available Provider: {provider}")
+                    providers.append(provider)
        self.model = onnxruntime.InferenceSession(
            model.SerializeToString(),
            providers=providers,
        )


The code assumes that listing "CPUExecutionProvider" after "CoreMLExecutionProvider" provides a full fallback if CoreML fails. In onnxruntime, CoreML compilation/session creation can still raise and prevent any fallback. Consider wrapping InferenceSession creation in try/except: attempt CoreML+CPU first, and on failure log a warning and retry with CPU-only providers (and without static-shape rewriting if needed).

Copilot · 2026-03-09T15:11:57Z

+        try:
+            model = shape_inference.infer_shapes(model)
+        except Exception:
+            pass  # best-effort; CoreML still benefits from fixed input


_make_static_model() swallows all exceptions from shape inference without any logging. If shape inference fails due to an invalid/unsupported model, it will silently proceed with a partially-updated graph and make CoreML failures harder to diagnose. At minimum, log the exception at debug/warning level (or catch narrower exceptions) so users can see when static-shape inference was skipped.

Suggested change

pass # best-effort; CoreML still benefits from fixed input

logger.warning(

"Static shape inference for CoreML optimization failed; "

"proceeding with fixed input shapes only.",

exc_info=True,

)

Copilot · 2026-03-09T15:11:57Z

-        # Calculate padding size and align to stride multiple
-        pad_w = (new_w - resized_w) % self.stride
-        pad_h = (new_h - resized_h) % self.stride
+        # Pad to full target size (enables static-shape CoreML acceleration)


resize_and_pad_image() now pads to the full requested size instead of stride-aligned minimal padding. This changes behavior for all providers/platforms and no longer matches the current docstring/parameter description about “ensuring dimensions are multiples of stride”. If the full-size padding is only required for CoreML static-shape mode, consider gating it behind a flag; otherwise, please update the docstring/comments to reflect that new_shape is now enforced exactly (and note the potential CPU perf/memory impact of always padding up to 1024×1024).

Suggested change

# Pad to full target size (enables static-shape CoreML acceleration)

# Pad to the full requested target size.

# NOTE:

# - This enforces new_shape exactly (no stride-aligned minimal padding).

# - Full-size padding is primarily used to enable static-shape CoreML

# acceleration, but it can increase CPU and memory usage for large

# target sizes (e.g., always padding up to 1024×1024).

cubic-dev-ai

1 issue found across 1 file

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="babeldoc/docvision/doclayout.py">

<violation number="1" location="babeldoc/docvision/doclayout.py:135">
P2: The full-size padding (`new_w - resized_w`) applies unconditionally on all platforms, not just when CoreML is in use. On Linux/Windows (and macOS without CoreML), the previous stride-aligned formula `(new_w - resized_w) % self.stride` produced substantially smaller tensors for most aspect ratios—e.g., padding of 8 px instead of 424 px. This introduces a throughput regression on the default CPU path. Store the `use_coreml` flag as an instance attribute and gate this padding behavior so non-CoreML inference keeps the smaller, stride-aligned tensors.</violation>
</file>

Since this is your first cubic review, here's how it works:

cubic automatically reviews your code and comments on bugs and improvements
Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
Ask questions if you need clarification on any suggestion

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-03-09T15:15:34Z

-        pad_w = (new_w - resized_w) % self.stride
-        pad_h = (new_h - resized_h) % self.stride
+        # Pad to full target size (enables static-shape CoreML acceleration)
+        pad_w = new_w - resized_w


P2: The full-size padding (new_w - resized_w) applies unconditionally on all platforms, not just when CoreML is in use. On Linux/Windows (and macOS without CoreML), the previous stride-aligned formula (new_w - resized_w) % self.stride produced substantially smaller tensors for most aspect ratios—e.g., padding of 8 px instead of 424 px. This introduces a throughput regression on the default CPU path. Store the use_coreml flag as an instance attribute and gate this padding behavior so non-CoreML inference keeps the smaller, stride-aligned tensors.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At babeldoc/docvision/doclayout.py, line 135: <comment>The full-size padding (`new_w - resized_w`) applies unconditionally on all platforms, not just when CoreML is in use. On Linux/Windows (and macOS without CoreML), the previous stride-aligned formula `(new_w - resized_w) % self.stride` produced substantially smaller tensors for most aspect ratios—e.g., padding of 8 px instead of 424 px. This introduces a throughput regression on the default CPU path. Store the `use_coreml` flag as an instance attribute and gate this padding behavior so non-CoreML inference keeps the smaller, stride-aligned tensors.</comment> <file context> @@ -97,9 +131,9 @@ def resize_and_pad_image(self, image, new_shape): - pad_w = (new_w - resized_w) % self.stride - pad_h = (new_h - resized_h) % self.stride + # Pad to full target size (enables static-shape CoreML acceleration) + pad_w = new_w - resized_w + pad_h = new_h - resized_h top, bottom = pad_h // 2, pad_h - pad_h // 2 </file context>

awwaawwa · 2026-03-26T09:29:29Z

Thank you for your contribution. According to https://funstory-ai.github.io/BabelDOC/CONTRIBUTOR_REWARD/ , you can apply for a monthly membership redemption code for Immersive Translate.

Copilot AI review requested due to automatic review settings March 9, 2026 15:07

Copilot started reviewing on behalf of garetneda-gif March 9, 2026 15:08 View session

chatgpt-codex-connector Bot reviewed Mar 9, 2026

View reviewed changes

Copilot AI reviewed Mar 9, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Mar 9, 2026

View reviewed changes

awwaawwa assigned awwaawwa and unassigned awwaawwa Mar 10, 2026

awwaawwa added the Planned label Mar 10, 2026

awwaawwa merged commit b015180 into funstory-ai:main Mar 26, 2026
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: re-enable CoreML acceleration on macOS with static input shapes#573

feat: re-enable CoreML acceleration on macOS with static input shapes#573
awwaawwa merged 1 commit intofunstory-ai:mainfrom
garetneda-gif:feat/coreml-acceleration

garetneda-gif commented Mar 9, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot Mar 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

awwaawwa commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# Fix input shapes to [1, 3, 1024, 1024] so CoreML can take over
		# 96%+ of the graph nodes (658/681) instead of just 3/823.

-            pass  # best-effort; CoreML still benefits from fixed input
+            logger.warning(
+                "Static shape inference for CoreML optimization failed; "
+                "proceeding with fixed input shapes only.",
+                exc_info=True,
+            )

-        # Pad to full target size (enables static-shape CoreML acceleration)
+        # Pad to the full requested target size.
+        # NOTE:
+        # - This enforces new_shape exactly (no stride-aligned minimal padding).
+        # - Full-size padding is primarily used to enable static-shape CoreML
+        #   acceleration, but it can increase CPU and memory usage for large
+        #   target sizes (e.g., always padding up to 1024×1024).

Conversation

garetneda-gif commented Mar 9, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes to OnnxModel in doclayout.py:

Benchmarks (Apple Silicon M1–M4)

Summary by cubic

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

awwaawwa commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

garetneda-gif commented Mar 9, 2026 •

edited by cubic-dev-ai Bot

Loading

Changes to `OnnxModel` in `doclayout.py`:

cubic-dev-ai Bot Mar 9, 2026 •

edited

Loading