Skip to content

[codex] Inline guarded numeric array payload access#5302

Closed
andrewtdiz wants to merge 3 commits into
mainfrom
codex/perry-numeric-array-direct-fastpath
Closed

[codex] Inline guarded numeric array payload access#5302
andrewtdiz wants to merge 3 commits into
mainfrom
codex/perry-numeric-array-direct-fastpath

Conversation

@andrewtdiz

@andrewtdiz andrewtdiz commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

  • inline raw-f64 numeric array element loads and stores after the existing typed-feedback numeric array guards
  • record explicit direct-load/direct-store native proof consumers and verify they still consume the raw-f64 layout fact
  • update typed-feedback, typed-shape, native-proof tests, and the performance run log

Performance

Stacked on #5295.

  • 16_matrix_multiply quick: 1842ms -> 1745ms, 5.3% faster
  • direct matrix binary instructions: 30.88B -> 28.04B, 9.2% fewer
  • direct matrix binary branches: 5.50B -> 4.65B, 15.5% fewer
  • 10_nested_loops compare median: 956ms -> 921ms, 3.7% faster

The traced matrix module still declares js_array_numeric_get_f64_unboxed and js_array_numeric_set_f64_unboxed, but no longer emits call sites to those helpers on the guarded fast paths.

Validation

  • cargo fmt --check
  • git diff --check
  • cargo test -p perry-codegen --test typed_feedback
  • cargo test -p perry-codegen --test typed_shape_descriptors
  • cargo test -p perry-codegen --test native_proof_regressions artifact_records_numeric_array_f64_fast_paths_and_fallback_reasons
  • cargo test -p perry-codegen native_value::verify::tests
  • cargo build --release
  • PERRY_BIN=target/release/perry python3 tests/test_typed_feedback_runtime_evidence.py
  • tests/test_benchmark_output_verifier.sh
  • target/release/perry compile --no-cache benchmarks/suite/16_matrix_multiply.ts -o /tmp/perry-matrix-direct-final --trace llvm --quiet
  • ./benchmarks/compare.sh --quick --runs 3 --warn-only --json-out /tmp/perry-direct-numeric-final-e816fc3e4.json
  • ./benchmarks/quick.sh

Summary by CodeRabbit

  • Performance
    • Improved numeric array f64 index get/set fast paths by inlining direct payload address handling and performing direct load/store of the value, with refreshed fast-path identifiers for better verification/recording.
  • Documentation
    • Added a new dated performance run entry with benchmark commands and before/after results for guarded numeric array direct payload access.
  • Tests
    • Updated verifier rules, typed-feedback/typed-shape assertions, and regression expectations to align with the new f64 fast-path record names and the elimination of prior unboxed helper-call patterns.
  • Benchmarks
    • Refreshed numeric array workload IR/native-representation expectations for the updated fast paths.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Numeric array index get and set fast paths are changed to compute element pointers inline (base + 8-byte header + idx×8) and emit direct DOUBLE load/store IR, removing the js_array_numeric_set_f64_unboxed/js_array_numeric_get_f64_unboxed runtime helper calls. Consumer fact identifiers are renamed to numeric_array_index_get.raw_f64_load and numeric_array_index_set.raw_f64_store, with the verifier, regression tests, typed-feedback tests, typed-shape tests, workload checks, and performance log updated accordingly.

Changes

Guarded Numeric Array Direct Payload Access

Layer / File(s) Summary
Inline raw-f64 element load/store in index get/set codegen
crates/perry-codegen/src/expr/index_get.rs, crates/perry-codegen/src/expr/index.rs, crates/perry-codegen/src/expr/index_set.rs
lower_guarded_array_index_get hoists element pointer computation (idx_i64, byte offset, header-adjusted address) before the require_numeric_layout branch and performs a direct DOUBLE load using the precomputed address. lower_index_set_fast in both index.rs and index_set.rs replaces the js_array_numeric_set_f64_unboxed call with inline pointer arithmetic (header + idx×8) and a direct DOUBLE store. All three files rename the recorded operation string to numeric_array_index_get.raw_f64_load / numeric_array_index_set.raw_f64_store.
Verifier extension and test updates for new consumer names
crates/perry-codegen/src/native_value/verify.rs, crates/perry-codegen/tests/native_proof_regressions.rs, crates/perry-codegen/tests/typed_feedback.rs, crates/perry-codegen/tests/typed_shape_descriptors.rs, benchmarks/compiler_output/workloads.toml, tests/test_compiler_output_regression.py
raw_f64_checked_native_consumer matcher extended to recognize the two new consumer strings. Unit test pairs for NumericArrayIndexGet/Set in verify.rs updated to use the renamed consumers. Native proof regression assertions for consumer fields updated. Typed-feedback IR assertions flipped from expecting unboxed helper calls to asserting their absence. Typed-shape descriptor test updated to verify that the bounded numeric store inlines the raw-f64 payload write. Workloads.toml IR checks and native-representation checks updated to match the new inlined behavior and consumer names. Python test file updated with new consumer identifiers and expected IR for the inlined numeric array set path.
Performance run log
PERF_RUN_LOG.md
New dated entry (2026-06-17) recording baseline vs. post-change benchmark results, LLVM trace evidence confirming raw-f64 load/store presence without helper call sites, verification commands, and PR reference.
sequenceDiagram
  participant Codegen as lower_index_get/set
  participant OldPath as Prior: runtime helper
  participant NewPath as New: inline pointer arithmetic

  rect rgba(200, 150, 100, 0.5)
  Note over OldPath: Before: indirect call
  Codegen->>OldPath: js_array_numeric_get/set_f64_unboxed()
  OldPath-->>Codegen: f64 value or void
  end

  rect rgba(100, 200, 150, 0.5)
  Note over NewPath: After: direct memory access
  Codegen->>NewPath: hoist: element_ptr = base + 8-byte header + idx*8
  NewPath-->>Codegen: element_ptr computed
  Codegen->>Codegen: load/store DOUBLE to element_ptr
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • PerryTS/perry#5132: Both PRs modify crates/perry-codegen/src/expr/index_get.rs's lower_guarded_array_index_get to eliminate the js_array_numeric_get_f64_unboxed helper call and instead inline the raw-f64 element load, updating related IR expectations accordingly.
  • PerryTS/perry#5291: Both PRs modify the numeric array index get/set lowering fast paths to replace js_array_numeric_*_f64_unboxed helper usage with inlined guarded raw f64 load/store behavior and update verifier/test expectations around the new raw-f64 facts/consumers.

Poem

🐇 Hop hop, no helper to call today,
I compute the pointer and store straight away!
Header plus index times eight — done inline,
No runtime detour, the IR looks fine.
raw_f64_store and raw_f64_load by name,
The rabbit optimized, and benchmarks acclaim! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and concisely describes the main change: inlining guarded numeric array payload access operations.
Description check ✅ Passed The PR description comprehensively covers the Summary, Performance metrics, and Validation steps, aligning well with the template requirements.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/perry-numeric-array-direct-fastpath

Comment @coderabbitai help to get the list of available commands and usage tips.

@andrewtdiz andrewtdiz force-pushed the codex/perry-numeric-array-direct-fastpath branch from 2b58bc2 to ed71efd Compare June 17, 2026 05:12
@proggeramlug proggeramlug marked this pull request as ready for review June 17, 2026 06:09
Base automatically changed from codex/perry-performance-20260617 to main June 17, 2026 08:40
@proggeramlug proggeramlug force-pushed the codex/perry-numeric-array-direct-fastpath branch from ed71efd to 452b8d8 Compare June 17, 2026 13:38
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{}

@proggeramlug proggeramlug force-pushed the codex/perry-numeric-array-direct-fastpath branch 2 times, most recently from f49be7c to f1945a4 Compare June 17, 2026 13:58
@proggeramlug proggeramlug force-pushed the codex/perry-numeric-array-direct-fastpath branch from f1945a4 to 73d3794 Compare June 18, 2026 06:36
proggeramlug pushed a commit that referenced this pull request Jun 18, 2026
…ined raw-f64 index access

#5302 inlines the guarded numeric array index get/set raw-f64 payload access
instead of routing through js_array_numeric_get/set_f64_unboxed helpers, but
left the native-region-proof workload spec asserting the old helper-based
shape. Update the numeric_arrays spec to match:
- ir_check numeric_array_uses_unboxed_set now pins the inline guarded store
  (idxset.inbounds + store double) and asserts the helper call is elided,
  mirroring the existing get check.
- require_records get/set consumers updated to numeric_array_index_get.raw_f64_load
  / numeric_array_index_set.raw_f64_store.

Verified against the captured CI artifacts (verify --gate => 34/34 pass).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmarks/compiler_output/workloads.toml`:
- Line 636: The regex pattern in the workload check uses `%\w+` which is too
restrictive for LLVM IR identifiers and fails to match valid names containing
dots like `%tmp.1`. Broaden the SSA identifier matching pattern by replacing all
three occurrences of `%\w+` with `%[\w.]+` to allow dots in LLVM IR identifiers,
making the workload check more robust and less brittle to different IR output
variations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: af58f009-0b3b-4b9a-bff0-a92a6d13c624

📥 Commits

Reviewing files that changed from the base of the PR and between 73d3794 and d23d5bd.

📒 Files selected for processing (1)
  • benchmarks/compiler_output/workloads.toml

contains = "js_array_numeric_set_f64_unboxed"
detail = "numeric indexed write uses the guarded raw-f64 helper"
contains = "js_typed_feedback_numeric_array_index_set_guard"
regex = '''idxset\.inbounds\.\d+:[\s\S]*?inttoptr i64 %\w+ to ptr\s*\n\s*store double %\w+, ptr %\w+[^\n]*\n\s*br label %idxset\.merge'''

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Broaden SSA identifier matching in the IR regex.

%\w+ is too restrictive for LLVM IR names and may fail on valid identifiers containing dots (e.g., %tmp.1), making this workload check flaky/overly brittle.

Suggested patch
-regex = '''idxset\.inbounds\.\d+:[\s\S]*?inttoptr i64 %\w+ to ptr\s*\n\s*store double %\w+, ptr %\w+[^\n]*\n\s*br label %idxset\.merge'''
+regex = '''idxset\.inbounds\.\d+:[\s\S]*?inttoptr i64 %[-a-zA-Z$._0-9]+ to ptr\s*\n\s*store double %[-a-zA-Z$._0-9]+, ptr %[-a-zA-Z$._0-9]+[^\n]*\n\s*br label %idxset\.merge'''
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
regex = '''idxset\.inbounds\.\d+:[\s\S]*?inttoptr i64 %\w+ to ptr\s*\n\s*store double %\w+, ptr %\w+[^\n]*\n\s*br label %idxset\.merge'''
regex = '''idxset\.inbounds\.\d+:[\s\S]*?inttoptr i64 %[-a-zA-Z$._0-9]+ to ptr\s*\n\s*store double %[-a-zA-Z$._0-9]+, ptr %[-a-zA-Z$._0-9]+[^\n]*\n\s*br label %idxset\.merge'''
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmarks/compiler_output/workloads.toml` at line 636, The regex pattern in
the workload check uses `%\w+` which is too restrictive for LLVM IR identifiers
and fails to match valid names containing dots like `%tmp.1`. Broaden the SSA
identifier matching pattern by replacing all three occurrences of `%\w+` with
`%[\w.]+` to allow dots in LLVM IR identifiers, making the workload check more
robust and less brittle to different IR output variations.

…ined raw-f64 index access

#5302 inlines the guarded numeric array index get/set raw-f64 payload access
instead of routing through js_array_numeric_get/set_f64_unboxed helpers, but
left the native-region-proof workload spec asserting the old helper-based
shape. Update the numeric_arrays spec to match:
- ir_check numeric_array_uses_unboxed_set now pins the inline guarded store
  (idxset.inbounds + store double) and asserts the helper call is elided,
  mirroring the existing get check.
- require_records get/set consumers updated to numeric_array_index_get.raw_f64_load
  / numeric_array_index_set.raw_f64_store.

Verified against the captured CI artifacts (verify --gate => 34/34 pass).
@proggeramlug

Copy link
Copy Markdown
Contributor

Closing as superseded. Since this codex stack was cut, main advanced 52+ commits and independently evolved the same hot codegen/runtime paths. This PR is one link in a 13-deep linear stack that conflicts with diverged, correctness-sensitive codegen and was never reviewed; landing it would mean rebasing the whole stack. Per-PR judgement was done before closing.

Specific: main already inlines the guard-proven raw-f64 numeric-array store (crates/perry-codegen/src/expr/index_set.rs:481 canonicalize_raw_f64_numeric_store_value + direct store) and the direct raw-f64 load (expr/index_get.rs), i.e. the core optimization here is already present — and with the negative-zero/NaN canonicalization this PR's raw store originally skipped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants