Skip to content

feat(hip,aiter): add AITER backend for silu_and_mul#251

Merged
demandal25 merged 6 commits into
ROCm:amd-integrationfrom
demandal25:feat/aiter-silu-and-mul-backend
Jun 15, 2026
Merged

feat(hip,aiter): add AITER backend for silu_and_mul#251
demandal25 merged 6 commits into
ROCm:amd-integrationfrom
demandal25:feat/aiter-silu-and-mul-backend

Conversation

@demandal25

@demandal25 demandal25 commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

Routes flashinfer.activation.silu_and_mul through AMD AITER's silu_and_mul
on ROCm via a new backend="auto"|"native"|"aiter" parameter, mirroring the
existing AITER-backend idiom in norm.py.

What changed

  • flashinfer/activation.pysilu_and_mul gains a backend param.
    On ROCm, a _auto_select_silu_and_mul_backend selector routes large 2D fp16
    inputs to AITER; everything else stays on the native JIT kernel. "aiter"
    is available as an explicit opt-in; "native" forces the JIT kernel. The
    backend argument is validated on all platforms (unknown values and an
    off-ROCm/unsupported "aiter" raise ValueError rather than silently
    falling back).
  • tests/rocm_tests/test_activation_aiter_hip.py — new tests: correctness
    vs reference across shapes/dtypes, out= handling, auto-selection branches,
    unknown-backend error, and unsupported-aiter rejection.
  • README.md — feature matrix + AITER Support section updated for the new
    silu_and_mul backend.

Architecture / design notes

auto backend selection (ROCm only):

Input Backend
bf16 (any shape) native — AITER bf16 max err ~6e-2 vs native ~4e-3
fp16, non-2D, or < 33M elements native — AITER's ~0.7us launch overhead loses
fp16, 2D, >= 33M elements aiter — ~5-10% faster, precision matches native

The cutoff (33 * 1024 * 1024 input elements, i.e. rows x 2*hidden) is the
measured break-even, e.g. 2048 x 16384.

Benchmark results

silu_and_mul, gfx942, CUDA-event timed (20 warmup / 200 iters):

dtype tokens x hidden native aiter speedup
fp16 8192 x 14336 184.1 us 167.9 us 1.10x
fp16 4096 x 14336 87.6 us 83.7 us 1.05x
fp16 <= 1024 x 8192 faster - <1x (native wins)

Test plan

  • pytest tests/rocm_tests/test_activation_hip.py tests/rocm_tests/test_activation_aiter_hip.py -m "not slow" -> 94 passed
  • FLASHINFER_TEST_TORCH_COMPILE=1 pytest tests/rocm_tests/test_torch_compile_hip.py -> 3 passed, 1 skipped
  • pre-commit run -a (changed files: ruff + markdownlint)

Copilot AI review requested due to automatic review settings June 15, 2026 18:22

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an AMD AITER-backed implementation path for flashinfer.activation.silu_and_mul on ROCm/HIP via a new backend="auto"|"native"|"aiter" parameter, with auto-selection for large fp16 2D inputs and ROCm-only tests covering correctness and selection behavior.

Changes:

  • Added HIP-only AITER integration and backend parameter to flashinfer.activation.silu_and_mul, including an auto-selector with a size cutoff.
  • Added ROCm tests validating AITER correctness vs a reference, out= behavior, auto-selection branches, and unknown-backend errors.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
flashinfer/activation.py Adds backend parameter and HIP AITER routing/auto-selection for silu_and_mul.
tests/rocm_tests/test_activation_aiter_hip.py Adds ROCm-only tests for AITER backend correctness and selection logic.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread flashinfer/activation.py
Comment thread tests/rocm_tests/test_activation_aiter_hip.py Outdated
demandal25 added a commit to demandal25/flashinfer that referenced this pull request Jun 15, 2026
…-div

Address Copilot review on PR ROCm#251:
- Validate the backend argument unconditionally so an unknown value or an
  explicit backend="aiter" off ROCm/unsupported arch raises ValueError
  instead of silently falling through to the native kernel.
- Use the clearer ceil-to-multiple-of-8 form in the auto-selection test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 15, 2026 18:37

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread flashinfer/activation.py
Comment thread tests/rocm_tests/test_activation_aiter_hip.py Outdated
demandal25 added a commit to demandal25/flashinfer that referenced this pull request Jun 15, 2026
Address second Copilot review on PR ROCm#251:
- backend="aiter" now probes _aiter_act_ops() and re-raises a clear
  ValueError (chaining the original) when the aiter package is missing or
  fails to import, instead of surfacing a cryptic ImportError at the call.
- The out= test seeds the tensor with NaN and asserts numerical
  correctness against the reference, so a no-op write can no longer pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
demandal25 and others added 4 commits June 15, 2026 18:46
Route flashinfer.activation.silu_and_mul through AMD AITER's silu_and_mul
on ROCm via a backend="auto"|"native"|"aiter" parameter, mirroring the
existing norm.py AITER-backend idiom.

"auto" stays on the native JIT kernel except for large (>=64M element) 2D
fp16 inputs, where AITER is ~5-10% faster and matches native precision.
bf16 is excluded from the auto path (AITER max err ~6e-2 vs native ~4e-3);
"aiter" remains available as an explicit opt-in.

Adds tests/rocm_tests/test_activation_aiter_hip.py covering correctness
across shapes/dtypes, out= handling, backend auto-selection, and the
unknown-backend error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-div

Address Copilot review on PR ROCm#251:
- Validate the backend argument unconditionally so an unknown value or an
  explicit backend="aiter" off ROCm/unsupported arch raises ValueError
  instead of silently falling through to the native kernel.
- Use the clearer ceil-to-multiple-of-8 form in the auto-selection test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t in README

Set the auto-selection threshold to the measured ~33M-element break-even
(was a conservative 64M). Update the README feature matrix and AITER
Support section to list silu_and_mul's AITER backend and its auto-routing
criteria.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address second Copilot review on PR ROCm#251:
- backend="aiter" now probes _aiter_act_ops() and re-raises a clear
  ValueError (chaining the original) when the aiter package is missing or
  fails to import, instead of surfacing a cryptic ImportError at the call.
- The out= test seeds the tensor with NaN and asserts numerical
  correctness against the reference, so a no-op write can no longer pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 15, 2026 18:48
@demandal25 demandal25 force-pushed the feat/aiter-silu-and-mul-backend branch from 4259b0b to f5ef7a4 Compare June 15, 2026 18:48

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread flashinfer/activation.py Outdated
Comment thread README.md Outdated
Comment thread flashinfer/activation.py
demandal25 and others added 2 commits June 15, 2026 19:01
Add requires_aiter to tests/test_helpers/test_helpers.py (gating on arch
+ aiter importability) and import it from every AITER rocm test, replacing
the per-file copies of the @pytest.mark.skipif(not is_aiter_supported...)
decorator. One definition, no duplicates.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address third Copilot review on PR ROCm#251:
- Type silu_and_mul's out= as Optional[torch.Tensor] to match the
  enable_pdl: Optional[bool] style.
- Make the backend="aiter" arch-check error strictly about the ROCm/arch
  requirement and include the actual device; the missing-package case is
  already reported separately by the import probe below.
- Rephrase the README Activation matrix cell to the "AITER when ...; else
  HIP native" pattern used by the other rows.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 15, 2026 19:08
@demandal25 demandal25 merged commit b0f77ef into ROCm:amd-integration Jun 15, 2026
1 check passed

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants