Skip to content

dstrbad/ml-systems-log

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Four Months in ML Systems

A self-directed study log: ML compilers, GPU kernels, MLIR, and the systems work that makes them fast.

Ride along with me if you like :)

What this is

Four months of focused work on getting genuinely good at ML systems: CUDA, Triton, MLIR, distributed GPU work, and whatever rabbit holes the work opens up. The endpoint is to ship something real: a kernel competition placement, a contribution to tinygrad or another open-source project, or a measurable improvement on something that matters. Whichever feels most alive by week six.

This is a public work log, not a tutorial.

Phases

  1. Foundation refresh: LLVM Kaleidoscope, C++ refresher, GPU MODE basics
  2. GPU fundamentals: CUDA, PMPP, naive and tiled matmul, NSight profiling
  3. Triton entry: Python-embedded GPU kernels, PTX inspection
  4. MLIR proper: IR dialects, lowering passes
  5. First real contribution: the centerpiece
  6. Synthesize: Flash Attention, NCCL collectives, CUDA Graphs, second push

Layout

/cuda     CUDA kernels, benchmarks, NSight reports
/triton   Triton kernels, PTX dumps, fused-op experiments
/mlir     MLIR Toy tutorial, dialect experiments, LLVM Kaleidoscope
/notes    Reading notes, paper summaries, lecture notes
/docs     The plan, references, anything reusable

Following along

If you're working through similar material, open an issue or reach out. I'm not teaching, but I'm happy to compare notes.

Reference stack

The work draws on:

Started

May 23, 2026.

About

A self-directed study log in ML systems: CUDA, Triton, MLIR. Ride along if you like.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors