Fine-tuning SAM 2 traditionally requires deep knowledge of PyTorch, distributed training, Hydra configs, and cloud GPU orchestration. This platform removes all of that friction.
Through a clean web interface, users can:
- Authenticate via GitHub or Google OAuth
- Configure training runs — choose LoRA rank, model checkpoint size, target dataset, and epoch count
- Launch GPU-accelerated training on Modal Labs with a single click
- Monitor progress through real-time streaming logs (Server-Sent Events)
- Download fine-tuned checkpoints directly from Cloudflare R2 storage
The result: production-quality SAM 2 fine-tuning, accessible to anyone with a browser.
| Layer | Technology | Why It Matters |
|---|---|---|
| Frontend | Next.js 16 + React 19 | Server Components and the App Router enable API routes, SSE streaming, and UI to coexist in one deployment — no separate backend server needed for the web layer |
| TypeScript 5 | End-to-end type safety from database schema (Drizzle) through API routes to React components, catching bugs at compile time | |
| Tailwind CSS 4 + Material UI 7 | Rapid, consistent styling with MUI's component library for complex UI elements like training configuration forms | |
| Drizzle ORM | Type-safe SQL with zero overhead — generates migrations from TypeScript schema, keeping the DB in sync without heavy ORM abstractions | |
| ML Backend | PyTorch 2.5+ / SAM 2 | Meta's state-of-the-art segmentation model, fine-tuned with LoRA adapters to minimize compute while preserving quality |
| Hydra Configs | Declarative, composable training configurations — each combination of model size, dataset, and hyperparameters maps to a clean YAML override | |
| FastAPI | Lightweight Python API layer that bridges the Next.js frontend to the PyTorch training loop, handling job dispatch and webhook callbacks | |
| Infrastructure | Modal Labs | Serverless GPU compute — pay only for training time with zero cold-start provisioning of A100/H100 hardware. No GPU clusters to manage |
| Vercel | Zero-config deployment for the Next.js app with edge functions, automatic HTTPS, and preview deployments on every PR | |
| Neon PostgreSQL | Serverless Postgres that scales to zero — perfect for bursty workloads where training jobs may be hours apart | |
| Cloudflare R2 | S3-compatible object storage with zero egress fees — critical when users download multi-GB checkpoint files | |
| AI Agent | LiveKit + OpenAI | Real-time voice agent integration powered by LiveKit's WebRTC infrastructure, enabling conversational interaction with the training platform |
| Deepgram + Cartesia | Speech-to-text and text-to-speech pipeline for natural voice interactions — Deepgram for low-latency transcription, Cartesia for expressive speech synthesis | |
| Auth & Ops | Better-Auth | Lightweight OAuth framework supporting GitHub and Google — admin approval gate ensures only authorized users can launch GPU training jobs |
| Logfire | Observability for the training pipeline — trace job submissions, monitor Modal webhook callbacks, and debug SSE streaming issues |
Authenticate Configure Train Monitor Download
┌───────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────┐
│ GitHub / │────>│ LoRA Rank │────>│ Modal Labs │────>│ Real-time │────>│ Checkpoint │
│ Google │ │ Model Size │ │ Serverless │ │ SSE Logs │ │ from R2 │
│ OAuth │ │ Dataset │ │ GPU (A100) │ │ │ │ │
└───────────┘ │ Epochs │ └──────────────┘ └──────────────┘ └────────────┘
└──────────────┘
- Sign in with GitHub or Google — an admin approves your account for training access
- Pick your parameters — LoRA rank (2/4/8/16/32), checkpoint size (tiny/small/base+/large), dataset, and epoch count
- Hit train — the app submits a job to Modal Labs, which spins up a GPU instance and begins fine-tuning
- Watch it run — logs stream back to your browser in real-time via Server-Sent Events
- Grab your model — once training completes, Modal uploads the checkpoint to R2 and you download it instantly
This project is organized as a monorepo with three git submodules, each handling a distinct concern:
sam2finetuning/
├── sam2loranocodefinetuning/ # Next.js 16 web application
│ ├── app/api/ # API routes (train, jobs, download, auth)
│ ├── src/components/ # React UI (Config, Logs, Controls)
│ ├── src/db/ # Drizzle ORM schema & migrations
│ └── src/lib/ # Auth, utils, constants
│
├── modalsam2/ # Meta SAM 2 fork with training support
│ ├── training/ # train.py, trainer.py, loss functions
│ ├── sam2/configs/ # Hydra YAML training configs
│ └── training/dataset/ # Dataset loaders (VOS, SA-1B, SA-V, DAVIS)
│
├── sam2webappvoiceagent/ # LiveKit voice agent integration
│ └── ... # Real-time voice interaction with the platform
│
├── images/ # Documentation assets
└── README.md
| Submodule | Purpose |
|---|---|
sam2loranocodefinetuning |
Full-stack web app — handles auth, UI, job management, SSE log streaming, and checkpoint downloads |
modalsam2 |
Fork of Meta's SAM 2 repo extended with LoRA fine-tuning support, Hydra configs, and Modal Labs deployment |
sam2webappvoiceagent |
Voice-powered AI agent that enables conversational interaction with the training platform via LiveKit |
- Node.js 18+ and pnpm
- Python 3.10+
- Accounts on Modal, Neon, Cloudflare R2
git clone --recurse-submodules https://github.com/czhurdlespeed/sam2finetuning.git
cd sam2finetuningcd sam2loranocodefinetuning
pnpm install
cp .env.example .env.local # fill in your service credentials
pnpm db:push # push schema to Neon
pnpm dev # start dev server on localhost:3000cd modalsam2/sam2
pip install -e ".[dev]"
cd checkpoints && ./download_ckpts.sh && cd ..The app requires credentials for several services. See the table below for the key variables:
| Variable | Service |
|---|---|
DATABASE_URL |
Neon PostgreSQL connection string |
MODAL_TRAIN_URL, MODAL_KEY, MODAL_SECRET |
Modal Labs API |
CF_R2_*, AWS_* |
Cloudflare R2 storage |
BETTER_AUTH_* |
OAuth configuration |
LIVEKIT_* |
Voice agent (optional) |
