- PTX ISA (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html)
- C++ Best Practice (https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html)
- CuTile-Python (https://docs.nvidia.com/cuda/cutile-python/index.html)
- CUTLASS (https://docs.nvidia.com/cutlass/latest/index.html)