Skip to content
View kadamrahul18's full-sized avatar

Highlights

  • Pro

Organizations

@hackforla

Block or report kadamrahul18

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
kadamrahul18/README.md

Rahul Kadam 👋

MS Computer Engineering (NYU). I build ML systems: reproducible benchmarks, distributed training + profiling workflows, and small backend tools that make experiments easy to compare and trust.

📍 Jersey City / NYC • Open to early-career roles (ML systems / platform / backend) • Willing to relocate


What I’m good at (in practice)

  • Distributed training + benchmarking: fixed-work experiments, throughput & step-time measurement, multi-GPU scaling (PyTorch + DeepSpeed/ZeRO)
  • Performance debugging: NVTX instrumentation, Nsight Systems traces, bottleneck attribution (compute vs comm vs sync)
  • Reproducibility & engineering maturity: run artifacts, config conventions, CI smoke tests, “one-command demo” repos
  • Systems foundations: C++ projects, Linux tooling, test harnesses

Proof-of-work (highlighted)

If you only click two things: the multi-node benchmark harness + the benchmark registry backend.

🚀 Featured Projects

Project What it is How to verify fast
GPT-2 Distributed Training Benchmarks Slurm-native, fixed-work GPT-2 training harness scaling to 2 nodes / 8×V100, with run artifacts + NVTX/Nsight profiling README has A/B table + run artifact paths. Includes a measured comm-tuning win (29,971 → 35,807 tokens/s, +19.5%) at fixed settings.
Benchmark Results Service Containerized FastAPI + worker (Postgres/Redis) that ingests benchmark runs and exposes derived comparisons via /compare make docker-up && make demo runs end-to-end locally. CI smoke tests included.
Opik (Comet ML) – Merged PR #1006 OSS contribution: BLEU metrics added with tests + docs External review + merge trail in PR.
MIPS Pipelined CPU Simulator Cycle-accurate 5-stage pipeline (hazards + forwarding) in C++ Verified via regression tests / traces.
Brain Tumor Segmentation Baseline (MONAI 3D U-Net) Reproducible training/eval baseline with guardrails (label/ROI checks, metric conventions) Saved artifacts + Slurm-ready runs for reruns/plots.

How I build (my default workflow)

  • Define a fixed workload (so results are comparable)
  • Log run artifacts (metrics JSON + metadata + “run complete” markers)
  • Use profiling (NVTX + Nsight) to turn “it’s slow” into a concrete bottleneck
  • Ship changes behind tests/CI so it’s not just a local experiment

Open source

  • Opik (Comet ML): BLEU metrics (NLTK-backed), unit tests, docs → merged upstream
    👉 comet-ml/opik#1006

Links

Pinned Loading

  1. GPT2-Optimization GPT2-Optimization Public

    GPT-2 (124M) fixed-work distributed training benchmark on NYU BigPurple (Slurm) scaling 1→8× V100 across 2 nodes using DeepSpeed ZeRO-1 + FP16/AMP. Built a reproducible harness that writes training…

    Python

  2. comet-ml/opik comet-ml/opik Public

    Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

    Python 18.4k 1.4k

  3. Brain-Tumor-Seg Brain-Tumor-Seg Public

    Reproducible 3D brain tumor segmentation baseline using MONAI 3D U-Net (MSD Task01), with config-driven train/eval and GPU-ready execution on Slurm (NYU Big Purple). Includes correctness guardrails…

    Python

  4. CSA-Labs/mips-pipelined CSA-Labs/mips-pipelined Public

    Cycle-accurate 5-stage MIPS pipeline simulator in C++ (IF/ID/EX/MEM/WB) with hazard detection + forwarding. Includes per-cycle state tracking and regression tests/traces for correctness.

    C++ 1