Skip to content

[Feature][QDP] IQP kernel fusion and persistent kernel / grid-stride optimizations#1253

Open
400Ping wants to merge 2 commits intoapache:mainfrom
400Ping:qdp/IQP-kernel-fusion
Open

[Feature][QDP] IQP kernel fusion and persistent kernel / grid-stride optimizations#1253
400Ping wants to merge 2 commits intoapache:mainfrom
400Ping:qdp/IQP-kernel-fusion

Conversation

@400Ping
Copy link
Copy Markdown
Member

@400Ping 400Ping commented Apr 10, 2026

Related Issues

Closes #1015

Changes

  • Bug fix
  • New feature
  • Refactoring
  • Documentation
  • Test
  • CI/CD pipeline
  • Other

Why

Optimize the QDP IQP CUDA path by reducing kernel-launch overhead and global-memory traffic, and by making large single-sample launches scale with a capped grid + grid-stride loop pattern.

How

CUDA kernel changes

  • Added grid-stride looping to the single-sample IQP phase kernel
  • Added capped-grid launch sizing using MAX_GRID_BLOCKS
  • Added a shared-memory fused path for n <= FWT_SHARED_MEM_THRESHOLD
    • phase generation in shared memory
    • full FWT in shared memory
    • normalization on final write-back
  • Fused the final global-memory FWT stage with normalization for n > threshold
  • Fused batch IQP phase + normalization so batch IQP no longer needs a standalone normalize kernel launch

Tests

Added GPU tests covering:

  • FWT/fused IQP vs naive CPU reference for small qubit counts
  • batch IQP encode vs repeated single-sample encode
  • large-state capped-grid / grid-stride regression (num_qubits = 20)

Checklist

  • Added or updated unit tests for all changes
  • Added or updated documentation for all changes

…ions

Signed-off-by: 400Ping <jiekaichang@apache.org>
@400Ping 400Ping requested a review from ryankert01 as a code owner April 10, 2026 02:18
@400Ping 400Ping requested review from guan404ming and rich7420 April 10, 2026 02:19
Signed-off-by: 400Ping <jiekaichang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] [QDP] IQP kernel fusion and persistent kernel / grid-stride optimizations

1 participant