Skip to content

agolajko/mlp-gemm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLP GEMM Custom CUDA Matrix Multiplication

Testing CUDA kernels for matrix multiplication with fused operations.

Installation

pip install -e .

Quick Start/ Benchmarking

python -m mygemm.bench --device cuda

Kernel Variants

  • Plain: Naive baseline implementation
  • Fused: Bias + ReLU fused into single kernel
  • Optimized: Tiled computation with shared memory, bank conflict resolution

Development

# Rebuild after changing CUDA code
pip install -e . --force-reinstall --no-deps

# Debug mode
CUDA_LAUNCH_BLOCKING=1 python -m mygemm.bench

Architecture

csrc/
├── mygemm_kernels.cu  # Naive & fused kernels
└── bank_extra.cu      # Optimized kernel
mygemm/
├── functional.py      # Autograd functions
├── modules.py         # nn.Module wrappers
└── bench.py           # Benchmarking

Acknowledgments

Optimized kernel based on siboehm/SGEMM_CUDA

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors