Pinned Loading
-
-
dataflow-bench
dataflow-bench PublicBenchmarking dense dataflow strategies and sparse SpMSpM algorithms for LLM workloads on NVIDIA T4 and AMD MI300x
Python
-
-
attn-dsl-bench
attn-dsl-bench PublicA research benchmark comparing FlashAttention forward-kernel implementations on AMD MI300X.
Python 1
-
cuda-oxide-ptx
cuda-oxide-ptx PublicPTX/SASS-level comparison of cuda-oxide (safe Rust → PTX) vs clang vs nvcc on the same GPU kernels.
Sass 1
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.

