Skip to content

[Fix] Intercept and align Inductor output strides across piecewise sub-graphs#29

Merged
cennn merged 1 commit into
SandAI-org:mainfrom
cennn:fix/restride-inductor-output-strides
Apr 27, 2026
Merged

[Fix] Intercept and align Inductor output strides across piecewise sub-graphs#29
cennn merged 1 commit into
SandAI-org:mainfrom
cennn:fix/restride-inductor-output-strides

Conversation

@cennn
Copy link
Copy Markdown
Collaborator

@cennn cennn commented Apr 27, 2026

🗂️ PR Category

  • ✨ New Feature
  • 🚀 Optimization (performance, memory, etc.)
  • 💥 Breaking Change
  • 🐛 Bug Fix
  • 🛠️ Development / Refactoring
  • 📚 Documentation
  • 🧹 Chore (Dependencies, CI/CD, Configuration, etc.)
  • 🧪 Testing

Summary

Inductor may silently change output strides during piecewise sub-graph compilation (e.g. mm padding). These strides are lost when the per-subgraph TracingContext is destroyed, causing downstream assert_size_stride failures at runtime.

Fix: intercept Inductor's reported output strides before context teardown, then align FakeTensor strides via as_strided (zero-copy) for correct downstream compilation.

Changes

  • piecewise_compiler.py: _intercept_inductor_output_strides() captures strides from set_tracing_context_output_strides before TracingContext teardown.
  • magi_backend.py: _restride_outputs() applies captured strides to FakeTensors; skips symbolic dimensions to preserve dynamic-shape compatibility.
  • test_stride_mismatch.py: regression test for non-contiguous view across piecewise boundary.
  • test_unbacked_symbol_guard.py: regression test for GuardOnDataDependentSymNode with mark_unbacked + view(-1).

Test plan

  • Unit tests pass (test_stride_mismatch, test_unbacked_symbol_guard, test_piecewise_deferred_assert_scope)
  • 4-GPU CP=4 GAGA2 base + dual-duration inference: no NameError, no stride assertion failures

…graphs

Inductor may change output memory layout (e.g. mm padding, kernel fusion)
during standalone_compile. When FakeTensor strides from sub-graph N flow
into sub-graph N+1's compilation, mismatched strides cause
assert_size_stride failures at runtime.

- Add _intercept_inductor_output_strides to capture strides Inductor
  reports via set_tracing_context_output_strides before the TracingContext
  is destroyed.
- Add _restride_outputs to update FakeTensor strides using as_strided
  (zero-copy view) so downstream sub-graphs compile with correct layouts.
- Add test_stride_mismatch.py for non-contiguous view across piecewise
  boundary regression.
- Add test_unbacked_symbol_guard.py for GuardOnDataDependentSymNode
  regression with mark_unbacked + view(-1).
@cennn cennn requested a review from wtr0504 April 27, 2026 15:39
Copy link
Copy Markdown
Collaborator

@wtr0504 wtr0504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cennn cennn merged commit 517fc88 into SandAI-org:main Apr 27, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants