Skip to content

fix: patch AllocCheck to allow jl_get_pgcstack_static on Apple Silicon#1092

Merged
yebai merged 2 commits into
mainfrom
fix/alloccheck-get-pgcstack-static
Mar 24, 2026
Merged

fix: patch AllocCheck to allow jl_get_pgcstack_static on Apple Silicon#1092
yebai merged 2 commits into
mainfrom
fix/alloccheck-get-pgcstack-static

Conversation

@yebai
Copy link
Copy Markdown
Member

@yebai yebai commented Mar 21, 2026

Closes #982

  • AllocCheck.fn_may_allocate allowlists "get_pgcstack" but not "get_pgcstack_static", the arm64-specific variant used on Apple Silicon
  • This caused a spurious "1 allocation" report for every @check_allocs-decorated function on macOS/arm64, making all check_allocs-based tests fail with false positives
  • Fixes by patching fn_may_allocate in MooncakeAllocCheckExt.__init__ to intercept "get_pgcstack_static" before the upstream classifier sees it

🤖 Generated with Claude Code

CI Summary — GitHub Actions

Documentation Preview

Mooncake.jl documentation for PR #1092 is available at:
https://chalk-lab.github.io/Mooncake.jl/previews/PR1092/

Performance

Performance Ratio:
Ratio of time to compute gradient and time to compute function.
Warning: results are very approximate! See here for more context.

┌────────────────────────────┬──────────┬──────────┬─────────────┬─────────┬─────────────┬────────┐
│                      Label │   Primal │ Mooncake │ MooncakeFwd │  Zygote │ ReverseDiff │ Enzyme │
│                     String │   String │   String │      String │  String │      String │ String │
├────────────────────────────┼──────────┼──────────┼─────────────┼─────────┼─────────────┼────────┤
│                   sum_1000 │ 180.0 ns │     1.56 │        1.62 │   0.667 │        3.23 │   6.18 │
│                  _sum_1000 │ 952.0 ns │     6.87 │        1.03 │  5190.0 │        33.5 │   1.08 │
│               sum_sin_1000 │  7.22 μs │     2.86 │        1.45 │    1.58 │         9.8 │   1.78 │
│              _sum_sin_1000 │  6.13 μs │     3.18 │         2.0 │   238.0 │        11.4 │   2.15 │
│                   kron_sum │ 212.0 μs │     11.4 │        3.25 │    11.2 │       308.0 │   21.1 │
│              kron_view_sum │ 290.0 μs │     10.3 │        4.63 │    12.8 │       243.0 │   11.0 │
│      naive_map_sin_cos_exp │  2.54 μs │     2.44 │        1.38 │ missing │        6.17 │   1.98 │
│            map_sin_cos_exp │  2.63 μs │     2.77 │        1.46 │     2.0 │        5.03 │   2.41 │
│      broadcast_sin_cos_exp │  2.64 μs │     2.54 │        1.41 │    4.54 │        1.29 │   1.94 │
│                 simple_mlp │ 326.0 μs │     4.79 │        2.77 │     1.6 │        8.12 │   3.04 │
│                     gp_lml │ 427.0 μs │     5.28 │        1.65 │     2.5 │     missing │   3.47 │
│ turing_broadcast_benchmark │  2.16 ms │      3.8 │        2.83 │ missing │        21.8 │   1.87 │
│         large_single_block │ 401.0 ns │     5.02 │        1.95 │  4330.0 │        30.5 │   2.15 │
└────────────────────────────┴──────────┴──────────┴─────────────┴─────────┴─────────────┴────────┘

@yebai yebai requested review from lkdvos and sunxd3 March 21, 2026 00:22
@github-actions
Copy link
Copy Markdown
Contributor

Mooncake.jl documentation for PR #1092 is available at:
https://chalk-lab.github.io/Mooncake.jl/previews/PR1092/

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 21, 2026

Performance Ratio:
Ratio of time to compute gradient and time to compute function.
Warning: results are very approximate! See here for more context.

┌────────────────────────────┬──────────┬──────────┬─────────────┬─────────┬─────────────┬────────┐
│                      Label │   Primal │ Mooncake │ MooncakeFwd │  Zygote │ ReverseDiff │ Enzyme │
│                     String │   String │   String │      String │  String │      String │ String │
├────────────────────────────┼──────────┼──────────┼─────────────┼─────────┼─────────────┼────────┤
│                   sum_1000 │ 170.0 ns │     1.71 │        1.71 │   0.706 │        3.54 │    6.6 │
│                  _sum_1000 │ 962.0 ns │     6.96 │        1.01 │  5090.0 │        33.2 │   1.07 │
│               sum_sin_1000 │  8.76 μs │     2.55 │        1.46 │     1.3 │        7.86 │   1.48 │
│              _sum_sin_1000 │  7.71 μs │     2.71 │        1.71 │   183.0 │        9.06 │   1.69 │
│                   kron_sum │ 201.0 μs │     11.6 │        3.34 │    9.08 │       329.0 │   22.1 │
│              kron_view_sum │ 284.0 μs │     10.4 │        4.73 │    26.7 │       272.0 │   7.92 │
│      naive_map_sin_cos_exp │  2.72 μs │     2.38 │        1.41 │ missing │        5.74 │   1.95 │
│            map_sin_cos_exp │  2.68 μs │     2.78 │        1.49 │    2.03 │        4.97 │   2.34 │
│      broadcast_sin_cos_exp │   2.8 μs │     2.47 │        1.44 │    4.23 │         1.2 │   1.82 │
│                 simple_mlp │ 341.0 μs │     4.92 │        2.73 │    2.09 │        7.54 │   3.02 │
│                     gp_lml │ 270.0 μs │     7.91 │        2.14 │    3.41 │     missing │   4.79 │
│ turing_broadcast_benchmark │  2.17 ms │     3.78 │        2.81 │ missing │        22.4 │   1.84 │
│         large_single_block │ 401.0 ns │     5.07 │        2.01 │  4270.0 │        29.3 │   2.15 │
└────────────────────────────┴──────────┴──────────┴─────────────┴─────────┴─────────────┴────────┘

AllocCheck's fn_may_allocate allowlist includes "get_pgcstack" but not
"get_pgcstack_static", the arm64-specific variant of that call used on
Apple Silicon. This causes a spurious "1 allocation" report for any
function that touches the GC stack on that platform, including trivial
@foldable functions like fdata_type.

Closes #982

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@yebai yebai force-pushed the fix/alloccheck-get-pgcstack-static branch from 3afb7e6 to ffc2fc3 Compare March 21, 2026 01:10
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@yebai
Copy link
Copy Markdown
Member Author

yebai commented Mar 21, 2026

It's better to fix AllocChecks rather than merge this PR.

@sunxd3
Copy link
Copy Markdown
Collaborator

sunxd3 commented Mar 23, 2026

Copy link
Copy Markdown
Collaborator

@lkdvos lkdvos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I am qualified to really review this, but from a far this looks reasonable and if it fixes the issues I'm happy with this.

[EDIT] only now saw the comments, indeed would be better to not have to go into the internals of alloccheck from here.

Comment thread ext/MooncakeAllocCheckExt.jl
Co-authored-by: Hong Ge <3279477+yebai@users.noreply.github.com>
Signed-off-by: Hong Ge <3279477+yebai@users.noreply.github.com>
@yebai yebai merged commit a4f3da4 into main Mar 24, 2026
139 of 140 checks passed
@yebai yebai deleted the fix/alloccheck-get-pgcstack-static branch March 24, 2026 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allocation checks in tests on MacOS

3 participants