tesla-p100

Run modern hybrid/MoE LLMs correctly and fast on cheap old Tesla P100 / GTX 1080 Ti cards. Fork of ik_llama.cpp: clean concurrent (np>1) Gated-DeltaNet hybrid decoding + Pascal sm_60 FP16 build tuning + built-in fan-out decomposer.

pascal concurrency cuda moe homelab mixture-of-experts hybrid-models tesla-p100 llama-cpp local-llm llm-inference gguf speculative-decoding qwen3 gated-deltanet ik-llama gtx-1080-ti sm60

Updated Jun 7, 2026
Shell

maltsev-andrey / cuda-nn-inference

Star

GPU-accelerated neural network inference using custom CUDA kernels. Achieves 97.82% accuracy on MNIST.

deep-learning parallel-computing cuda python3 pytorch nvidia neural-networks numba performance-optimization parallel-programming gpu-programming tesla-p100 rhel9

Updated Dec 5, 2025
Python

maltsev-andrey / gpu-data-exploration

Star

Data exploration tools for GPU computing benchmarks - Wikipedia & HDF5 sensor datasets

time-series wikipedia python3 scientific-computing hdf5 gpu-computing data-exploration tesla-p100

Updated Nov 1, 2025
Python

maltsev-andrey / cuda_fft

Star

GPU-accelerated Fast Fourier Transform implementation using CUDA. Demonstrates Cooley-Tukey radix-2 algorithm with shared memory optimization, achieving 534x speedup over CPU and 1.74x over NumPy for large transforms. Developed on Tesla P100.

hpc signal-processing parallel-computing cuda python3 nvidia scientific-computing high-performance-computing gpu-acceleration fft cooley-tukey-fft numba gpu-programming tesla-p100

Updated Jan 3, 2026
Python

maltsev-andrey / sparse_kernels

Star

High-performance GPU SpMV kernels in Python/Numba CUDA. Achieves 96.6 GB/s (53% of cuSPARSE) with 5,432x speedup over CPU. Includes optimization analysis.

hpc linear-algebra cuda scientific-computing high-performance-computing gpu-acceleration numba spmv sparse-matrix gpu-programming cusparse python-cuda tesla-p100

Updated Dec 27, 2025
Python

Improve this page

Add a description, image, and links to the tesla-p100 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tesla-p100 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tesla-p100

Here are 8 public repositories matching this topic...

chocolatemoo53 / cloudstreaming

jcmariscal / jcm-llm-finetune-tiny-llama-from-scratch-scripts-py

maltsev-andrey / cuda-matrix-multiplication

poisonxa16 / PXA_llama

maltsev-andrey / cuda-nn-inference

maltsev-andrey / gpu-data-exploration

maltsev-andrey / cuda_fft

maltsev-andrey / sparse_kernels

Improve this page

Add this topic to your repo