Clario takes a 2-hour YouTube lecture, transcribes it with AI, breaks it into bite-sized concepts, generates video clips, and presents them in a TikTok-style vertical reel — each clip paired with an AI-generated summary and bullet points.
No more scrubbing through boring lectures. Just scroll, learn, repeat.
🚀 Quick Start · 🏗️ Architecture · 📡 API Docs · 🧠 Pipeline · 🛠️ Tech Stack
300M+ students worldwide watch YouTube lectures. The average lecture is 90 minutes. The average attention span is 8 seconds. The math doesn't work.
Students waste hours scrubbing through long videos, re-watching sections, and taking notes manually. Clario fixes this by converting any lecture into structured, scrollable, summarized micro-content — automatically.
YouTube URL
│
▼
┌─────────────┐
│ 📥 Download │ yt-dlp (720p, anti-bot, cookie support)
└──────┬──────┘
▼
┌─────────────┐
│ 🎤 Transcribe │ OpenAI Whisper (local GPU, word-level timestamps)
└──────┬──────┘
▼
┌─────────────┐
│ ✂️ Segment │ Smart sentence-boundary detection (~60s chunks)
└──────┬──────┘
▼
┌─────────────┐
│ 🎬 Clip │ FFmpeg stream-copy extraction (no re-encode)
└──────┬──────┘
▼
┌─────────────┐
│ 🧠 Summarize │ Groq LLaMA 3 (title + summary + bullets + tags)
└──────┬──────┘
▼
┌─────────────┐
│ 📦 Store │ Local disk (dev) / Cloudflare R2 (prod)
└──────┬──────┘
▼
Study Reel 🎓
Input: A single YouTube URL
Output: A scrollable reel of concept clips, each with:
- 🎬 A short video clip (45-90 seconds)
- 📌 An AI-generated concept title
- 💡 A one-liner summary
- 📝 3-4 bullet key points
- 🏷️ Topic tags
| Layer | Technology | Purpose | Cost |
|---|---|---|---|
| 🌐 | FastAPI | Async REST API + WebSocket server | Free |
| ⚙️ | Celery + Redis | Distributed task queue for video processing | Free |
| 🎤 | OpenAI Whisper | Speech-to-text transcription (runs on local GPU) | Free |
| 🧠 | Groq API (LLaMA 3 8B) | AI summarization with structured JSON output | Free tier |
| 📥 | yt-dlp | YouTube video download with anti-bot measures | Free |
| 🎬 | FFmpeg | Video clip extraction via stream copy | Free |
| 🗄️ | SQLite / PostgreSQL | Relational database for all metadata | Free |
| 📦 | Cloudflare R2 | Production clip storage (S3-compatible) | Free 10GB |
| 🐳 | Docker Compose | Local Redis infrastructure | Free |
Every component either runs locally on your hardware or uses a generous free tier. No credit card required.
┌──────────────────────────────────────────────────────────────────────┐
│ CLIENT (Browser) │
│ │
│ test-ui/index.html ← WebSocket progress ← REST API calls │
└────────────────────────────────┬─────────────────────────────────────┘
│ HTTP / WebSocket
┌────────────────────────────────▼─────────────────────────────────────┐
│ FASTAPI SERVER (:8000) │
│ │
│ POST /api/jobs → Create job, enqueue Celery task │
│ GET /api/jobs/{id} → Poll job status + progress % │
│ GET /api/reels/{id} → Full reel data (segments + clips) │
│ GET /api/reels/{id}/notes → Aggregated structured notes │
│ WS /ws/jobs/{id} → Real-time progress stream │
│ GET /health → Health check │
│ GET /docs → Interactive Swagger UI │
└───────┬───────────────────────────────┬──────────────────────────────┘
│ │
┌───────▼───────┐ ┌──────────▼──────────────────────────────┐
│ SQLite DB │ │ CELERY WORKER │
│ │ │ │
│ • jobs │ │ Stage 1: yt-dlp download (720p) │
│ • videos │ │ Stage 2: Whisper transcribe (GPU) │
│ • segments │ │ Stage 3: Sentence-boundary segmentation│
│ • clips │ │ Stage 4: FFmpeg clip extraction │
│ • summaries │ │ Stage 5: Groq AI summarization │
│ │ │ Stage 6: Local / R2 storage │
└───────────────┘ │ Stage 7: Mark job complete │
└──────────────────┬──────────────────────┘
│
┌──────────────────▼──────────────────────┐
│ REDIS (:6379) │
│ Task broker + result backend │
└─────────────────────────────────────────┘
| Tool | Install |
|---|---|
| Python 3.10+ | python.org |
| Docker Desktop | docker.com |
| FFmpeg | winget install ffmpeg |
| Groq API Key | Free at console.groq.com |
git clone https://github.com/Mohammedsami001/Clario.git
cd Clario/backend
cp .env.example .envEdit .env and add your GROQ_API_KEY:
GROQ_API_KEY=gsk_your_key_here
WHISPER_MODEL=base
WHISPER_DEVICE=cuda # Use 'cpu' if no NVIDIA GPUcd Clario
docker compose up redis -dWindows:
cd backend
python -m venv venv
.\venv\Scripts\activate
# Install PyTorch (pick one):
pip install torch --index-url https://download.pytorch.org/whl/cu121 # NVIDIA GPU
pip install torch --index-url https://download.pytorch.org/whl/cpu # CPU only
# Install everything else:
pip install openai-whisper
pip install -r requirements.txtmacOS / Linux:
cd backend
python -m venv venv && source venv/bin/activate
pip install torch openai-whisper
pip install -r requirements.txtTerminal 1 — API Server:
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000✅ Look for: Clario API started — tables ready
Terminal 2 — Celery Worker:
celery -A app.tasks.celery_app.celery_app worker --loglevel=info --pool=solo✅ Look for: celery@... ready.
Open test-ui/index.html in your browser → paste a YouTube URL → hit Generate Study Reel → watch the magic happen.
Interactive API docs: http://localhost:8000/docs
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/jobs |
Submit a YouTube URL → starts the pipeline |
GET |
/api/jobs/{id} |
Poll job status, stage, and progress (0-100%) |
GET |
/api/jobs |
List all jobs (most recent first) |
Create a job:
curl -X POST http://localhost:8000/api/jobs \
-H "Content-Type: application/json" \
-d '{"youtube_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}'Response:
{
"id": "3a102508-5121-48b5-9b11-69cded2b8e04",
"status": "pending",
"stage": null,
"progress": 0,
"youtube_url": "https://www.youtube.com/watch?v=...",
"created_at": "2026-05-01T10:30:00Z"
}| Method | Endpoint | Description |
|---|---|---|
GET |
/api/reels/{job_id} |
Full reel: video metadata + all segments with clips and summaries |
GET |
/api/reels/{job_id}/notes |
Aggregated notes for all segments |
| Protocol | Endpoint | Description |
|---|---|---|
WS |
/ws/jobs/{job_id} |
Real-time progress stream (polls every 1.5s) |
WebSocket message format:
{
"job_id": "3a102508-...",
"status": "processing",
"stage": "Transcribing audio...",
"progress": 25,
"error_msg": null
}| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Service health check |
- Tool: yt-dlp (latest, auto-updated)
- Resolution: 720p max (quality/speed balance)
- Anti-bot: User-Agent spoofing, browser cookie extraction (Chrome → Edge → Firefox), 5 retries
- Output:
temp/{job_id}/source.mp4
- Tool: OpenAI Whisper (
basemodel) - Hardware: NVIDIA GPU (CUDA + fp16) or CPU fallback
- Features: Word-level timestamps, VAD for silence detection
- Output: Full transcript + word array with
{word, start, end}
- Strategy: Time-based windowing with smart sentence-boundary detection
- Target: ~60 second segments (configurable via
SEGMENT_DURATION) - Intelligence: Finds nearest sentence break (
.?!) within a ±10 second window — never cuts mid-sentence - Guard rail: Max 20 segments per video (
MAX_SEGMENTS)
- Tool: FFmpeg with stream copy (
-c copy) — no re-encoding - Speed: Near-instant extraction (just copies bytes)
- Output:
temp/{job_id}/clips/clip_000.mp4
- Tool: Groq API with
llama3-8b-8192 - Output format: Structured JSON with
title,summary,bullets[],tags[] - Temperature: 0.3 (deterministic, factual output)
- Fallback: If Groq fails, extracts first sentence as summary
- Rate limit: 14,400 requests/day (free tier) — supports 700+ videos/day
- Dev mode: Copies clips to
media/clips/→ served via FastAPI static files - Prod mode: Uploads to Cloudflare R2 (S3-compatible) → returns public URL
- Cleanup: Deletes temp files after processing to save disk space
Clario/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI entry point, CORS, static files
│ │ ├── api/
│ │ │ ├── jobs.py # Job CRUD endpoints
│ │ │ ├── reels.py # Reel & notes retrieval
│ │ │ └── ws.py # WebSocket live progress
│ │ ├── core/
│ │ │ ├── config.py # Pydantic settings (from .env)
│ │ │ └── database.py # SQLAlchemy engine & session
│ │ ├── models/
│ │ │ └── models.py # ORM: Job, Video, Segment, Clip, Summary
│ │ ├── pipeline/
│ │ │ ├── download.py # yt-dlp with anti-403 hardening
│ │ │ ├── transcribe.py # OpenAI Whisper (GPU/CPU)
│ │ │ ├── segment.py # Sentence-boundary segmentation
│ │ │ ├── clip.py # FFmpeg stream-copy extraction
│ │ │ ├── summarize.py # Groq LLaMA 3 summarization
│ │ │ └── storage.py # Local disk / Cloudflare R2
│ │ └── tasks/
│ │ ├── celery_app.py # Celery configuration
│ │ └── process_video.py # Full pipeline orchestration
│ ├── requirements.txt
│ ├── .env.example
│ ├── setup.bat # Windows one-click setup
│ ├── start_api.bat # Start FastAPI server
│ └── start_worker.bat # Start Celery worker
│
├── test-ui/
│ └── index.html # Dark-mode test interface
│
├── docker-compose.yml # Redis service
├── push_to_github.bat # Git push automation
├── .gitignore
└── README.md
jobs → tracks pipeline status, progress %, error messages
videos → source metadata (title, duration, transcript, YouTube URL)
segments → concept chunks (index, title, start/end time, transcript)
clips → video files (local path or R2 URL, duration)
summaries → AI output (one-liner, bullet points, topic tags)Relationships:
Job 1──1 Video 1──* Segment 1──1 Clip 1──1 Summary
All settings via .env — see .env.example for defaults:
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
sqlite:///./clario.db |
Database connection string |
REDIS_URL |
redis://localhost:6379/0 |
Celery broker URL |
GROQ_API_KEY |
— | Required. Get free at console.groq.com |
WHISPER_MODEL |
base |
Whisper model size (tiny, base, small, medium, large) |
WHISPER_DEVICE |
cuda |
cuda for NVIDIA GPU, cpu for CPU-only |
SEGMENT_DURATION |
60 |
Target segment length in seconds |
MAX_SEGMENTS |
20 |
Maximum segments per video |
BASE_URL |
http://localhost:8000 |
API base URL (for local clip URLs) |
The included test-ui/index.html provides a complete dark-mode test interface with:
- 🔗 YouTube URL input
- 📊 Real-time progress bar via WebSocket
- 🏷️ Status badges (pending → processing → done/failed)
- 🎬 Video player for each generated clip
- 📌 AI-generated titles, summaries, and bullet points
- 🏷️ Topic tags
- 📝 Full notes view with export capability
- Fork the repository
- Create a feature branch (
git checkout -b feat/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feat/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License — see the LICENSE file for details.
Built with Precision by SAMI
Clario MVP