Skip to content
View R3hankhan123's full-sized avatar

Highlights

  • Pro

Block or report R3hankhan123

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
R3hankhan123/README.md

Hey ๐Ÿ‘‹, I'm Rehan Khan

๐Ÿ”ง I build systems that run models not demo notebooks that break outside Jupyter


๐Ÿง  What I Actually Do

  • ๐Ÿš€ Make LLMs run on spyre accelerator and CPUs (yes CPUs)
  • ๐Ÿ”ฉ Bend vLLM into places it wasn't designed for
  • ๐Ÿ—๏ธ Fix build systems across architectures
  • ๐Ÿ“ฆ Make multi-arch containers actually behave

๐Ÿ”ญ Currently Building

  • โšก CPU-only LLM inference pipelines
  • ๐Ÿ›๏ธ s390x(cpu) and spyre support for modern ML stacks
  • โ˜ธ๏ธ Infra that works beyond a single machine

โš™๏ธ Current Obsessions

๐Ÿ”ฅ squeezing every drop of performance.
โš ๏ธ  making PyTorch do questionable things
โ˜ธ๏ธ  running clean infra on Kubernetes

๐Ÿ› ๏ธ Languages & Tools


๐Ÿ“Š GitHub Stats

r3hankhan123

r3hankhan123


๐Ÿค Connect With Me

linkedin


Pinned Loading

  1. containerd containerd Public

    Forked from containerd/containerd

    An open and reliable container runtime

    Go

  2. torch-spyre torch-spyre Public

    Forked from torch-spyre/torch-spyre

    C++

  3. pytorch/pytorch pytorch/pytorch Public

    Tensors and Dynamic neural networks in Python with strong GPU acceleration

    Python 100k 27.9k

  4. vllm-project/vllm vllm-project/vllm Public

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 81.5k 17.5k