kiwi3shark

Follow

kiwi-shark kiwi3shark

Follow

ML Infra and Systems

3 followers · 18 following

Popular repositories Loading

plasma plasma Public

Forked from ray-project/plasma

A minimal shared memory object store design

C
ThunderKittens ThunderKittens Public

Forked from HazyResearch/ThunderKittens

Tile primitives for speedy kernels

Cuda
tiny-llm tiny-llm Public

Forked from skyzh/tiny-llm

(🚧 WIP) a course of LLM inference serving on Apple Silicon for systems engineers.

Python
tlb_shootdowns tlb_shootdowns Public

Forked from bitcharmer/tlb_shootdowns

C
SageAttention SageAttention Public

Forked from thu-ml/SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda
mini-sglang mini-sglang Public

Forked from sgl-project/mini-sglang

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python