Skip to content

czhurdlespeed/sam2finetuning

Repository files navigation

SAM 2 No-Code Fine-Tuning Platform

Train Meta's Segment Anything Model 2 on your own data — no code required.

Next.js React TypeScript PyTorch Tailwind


Visit Webpage 👋 · LinkedIn · Report Bug 🪳




Demo



What It Does

Fine-tuning SAM 2 traditionally requires deep knowledge of PyTorch, distributed training, Hydra configs, and cloud GPU orchestration. This platform removes all of that friction.

Through a clean web interface, users can:

  • Authenticate via GitHub or Google OAuth
  • Configure training runs — choose LoRA rank, model checkpoint size, target dataset, and epoch count
  • Launch GPU-accelerated training on Modal Labs with a single click
  • Monitor progress through real-time streaming logs (Server-Sent Events)
  • Download fine-tuned checkpoints directly from Cloudflare R2 storage

The result: production-quality SAM 2 fine-tuning, accessible to anyone with a browser.



Architecture & Tech Stack

SAM2 Web Application Tech Stack
System architecture showing the full data flow from frontend to GPU training and back

Why These Technologies?

Layer Technology Why It Matters
Frontend Next.js 16 + React 19 Server Components and the App Router enable API routes, SSE streaming, and UI to coexist in one deployment — no separate backend server needed for the web layer
TypeScript 5 End-to-end type safety from database schema (Drizzle) through API routes to React components, catching bugs at compile time
Tailwind CSS 4 + Material UI 7 Rapid, consistent styling with MUI's component library for complex UI elements like training configuration forms
Drizzle ORM Type-safe SQL with zero overhead — generates migrations from TypeScript schema, keeping the DB in sync without heavy ORM abstractions
ML Backend PyTorch 2.5+ / SAM 2 Meta's state-of-the-art segmentation model, fine-tuned with LoRA adapters to minimize compute while preserving quality
Hydra Configs Declarative, composable training configurations — each combination of model size, dataset, and hyperparameters maps to a clean YAML override
FastAPI Lightweight Python API layer that bridges the Next.js frontend to the PyTorch training loop, handling job dispatch and webhook callbacks
Infrastructure Modal Labs Serverless GPU compute — pay only for training time with zero cold-start provisioning of A100/H100 hardware. No GPU clusters to manage
Vercel Zero-config deployment for the Next.js app with edge functions, automatic HTTPS, and preview deployments on every PR
Neon PostgreSQL Serverless Postgres that scales to zero — perfect for bursty workloads where training jobs may be hours apart
Cloudflare R2 S3-compatible object storage with zero egress fees — critical when users download multi-GB checkpoint files
AI Agent LiveKit + OpenAI Real-time voice agent integration powered by LiveKit's WebRTC infrastructure, enabling conversational interaction with the training platform
Deepgram + Cartesia Speech-to-text and text-to-speech pipeline for natural voice interactions — Deepgram for low-latency transcription, Cartesia for expressive speech synthesis
Auth & Ops Better-Auth Lightweight OAuth framework supporting GitHub and Google — admin approval gate ensures only authorized users can launch GPU training jobs
Logfire Observability for the training pipeline — trace job submissions, monitor Modal webhook callbacks, and debug SSE streaming issues


How It Works

  Authenticate          Configure             Train                Monitor             Download
 ┌───────────┐     ┌──────────────┐     ┌──────────────┐     ┌──────────────┐     ┌────────────┐
 │  GitHub /  │────>│  LoRA Rank   │────>│  Modal Labs  │────>│  Real-time   │────>│ Checkpoint │
 │  Google    │     │  Model Size  │     │  Serverless  │     │  SSE Logs    │     │ from R2    │
 │  OAuth     │     │  Dataset     │     │  GPU (A100)  │     │              │     │            │
 └───────────┘     │  Epochs      │     └──────────────┘     └──────────────┘     └────────────┘
                   └──────────────┘
  1. Sign in with GitHub or Google — an admin approves your account for training access
  2. Pick your parameters — LoRA rank (2/4/8/16/32), checkpoint size (tiny/small/base+/large), dataset, and epoch count
  3. Hit train — the app submits a job to Modal Labs, which spins up a GPU instance and begins fine-tuning
  4. Watch it run — logs stream back to your browser in real-time via Server-Sent Events
  5. Grab your model — once training completes, Modal uploads the checkpoint to R2 and you download it instantly


Repository Structure

This project is organized as a monorepo with three git submodules, each handling a distinct concern:

sam2finetuning/
├── sam2loranocodefinetuning/   # Next.js 16 web application
│   ├── app/api/               #   API routes (train, jobs, download, auth)
│   ├── src/components/        #   React UI (Config, Logs, Controls)
│   ├── src/db/                #   Drizzle ORM schema & migrations
│   └── src/lib/               #   Auth, utils, constants
│
├── modalsam2/                 # Meta SAM 2 fork with training support
│   ├── training/              #   train.py, trainer.py, loss functions
│   ├── sam2/configs/          #   Hydra YAML training configs
│   └── training/dataset/      #   Dataset loaders (VOS, SA-1B, SA-V, DAVIS)
│
├── sam2webappvoiceagent/      # LiveKit voice agent integration
│   └── ...                    #   Real-time voice interaction with the platform
│
├── images/                    # Documentation assets
└── README.md
Submodule Purpose
sam2loranocodefinetuning Full-stack web app — handles auth, UI, job management, SSE log streaming, and checkpoint downloads
modalsam2 Fork of Meta's SAM 2 repo extended with LoRA fine-tuning support, Hydra configs, and Modal Labs deployment
sam2webappvoiceagent Voice-powered AI agent that enables conversational interaction with the training platform via LiveKit


Getting Started

Prerequisites

Clone with Submodules

git clone --recurse-submodules https://github.com/czhurdlespeed/sam2finetuning.git
cd sam2finetuning

Frontend

cd sam2loranocodefinetuning
pnpm install
cp .env.example .env.local   # fill in your service credentials
pnpm db:push                 # push schema to Neon
pnpm dev                     # start dev server on localhost:3000

Training Backend

cd modalsam2/sam2
pip install -e ".[dev]"
cd checkpoints && ./download_ckpts.sh && cd ..

Environment Variables

The app requires credentials for several services. See the table below for the key variables:

Variable Service
DATABASE_URL Neon PostgreSQL connection string
MODAL_TRAIN_URL, MODAL_KEY, MODAL_SECRET Modal Labs API
CF_R2_*, AWS_* Cloudflare R2 storage
BETTER_AUTH_* OAuth configuration
LIVEKIT_* Voice agent (optional)


Built by Calvin Wetzel

LinkedIn


If you found this project useful, consider giving it a star!

About

Next.js and Modal application for no code fine-tuning of Segment Anything Model 2 (SAM 2)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages