Run modern hybrid/MoE LLMs correctly and fast on cheap old Tesla P100 / GTX 1080 Ti cards. Fork of ik_llama.cpp: clean concurrent (np>1) Gated-DeltaNet hybrid decoding + Pascal sm_60 FP16 build tuning + built-in fan-out decomposer.
pascal concurrency cuda moe homelab mixture-of-experts hybrid-models tesla-p100 llama-cpp local-llm llm-inference gguf speculative-decoding qwen3 gated-deltanet ik-llama gtx-1080-ti sm60
-
Updated
Jun 7, 2026 - Shell