⚡ Clario

Turn any YouTube lecture into a scrollable study reel.

Clario takes a 2-hour YouTube lecture, transcribes it with AI, breaks it into bite-sized concepts, generates video clips, and presents them in a TikTok-style vertical reel — each clip paired with an AI-generated summary and bullet points.

No more scrubbing through boring lectures. Just scroll, learn, repeat.

🚀 Quick Start · 🏗️ Architecture · 📡 API Docs · 🧠 Pipeline · 🛠️ Tech Stack

🎯 The Problem

300M+ students worldwide watch YouTube lectures. The average lecture is 90 minutes. The average attention span is 8 seconds. The math doesn't work.

Students waste hours scrubbing through long videos, re-watching sections, and taking notes manually. Clario fixes this by converting any lecture into structured, scrollable, summarized micro-content — automatically.

✨ What Clario Does

  YouTube URL
       │
       ▼
  ┌─────────────┐
  │  📥 Download │  yt-dlp (720p, anti-bot, cookie support)
  └──────┬──────┘
         ▼
  ┌─────────────┐
  │  🎤 Transcribe │  OpenAI Whisper (local GPU, word-level timestamps)
  └──────┬──────┘
         ▼
  ┌─────────────┐
  │  ✂️ Segment  │  Smart sentence-boundary detection (~60s chunks)
  └──────┬──────┘
         ▼
  ┌─────────────┐
  │  🎬 Clip     │  FFmpeg stream-copy extraction (no re-encode)
  └──────┬──────┘
         ▼
  ┌─────────────┐
  │  🧠 Summarize │  Groq LLaMA 3 (title + summary + bullets + tags)
  └──────┬──────┘
         ▼
  ┌─────────────┐
  │  📦 Store    │  Local disk (dev) / Cloudflare R2 (prod)
  └──────┬──────┘
         ▼
    Study Reel 🎓

Input: A single YouTube URL
Output: A scrollable reel of concept clips, each with:

🎬 A short video clip (45-90 seconds)
📌 An AI-generated concept title
💡 A one-liner summary
📝 3-4 bullet key points
🏷️ Topic tags

🛠️ Tech Stack

Layer	Technology	Purpose	Cost
🌐	FastAPI	Async REST API + WebSocket server	Free
⚙️	Celery + Redis	Distributed task queue for video processing	Free
🎤	OpenAI Whisper	Speech-to-text transcription (runs on local GPU)	Free
🧠	Groq API (LLaMA 3 8B)	AI summarization with structured JSON output	Free tier
📥	yt-dlp	YouTube video download with anti-bot measures	Free
🎬	FFmpeg	Video clip extraction via stream copy	Free
🗄️	SQLite / PostgreSQL	Relational database for all metadata	Free
📦	Cloudflare R2	Production clip storage (S3-compatible)	Free 10GB
🐳	Docker Compose	Local Redis infrastructure	Free

💰 Total Running Cost: $0/month

Every component either runs locally on your hardware or uses a generous free tier. No credit card required.

🏗️ System Architecture

┌──────────────────────────────────────────────────────────────────────┐
│                         CLIENT (Browser)                             │
│                                                                      │
│    test-ui/index.html  ←  WebSocket progress  ←  REST API calls      │
└────────────────────────────────┬─────────────────────────────────────┘
                                 │ HTTP / WebSocket
┌────────────────────────────────▼─────────────────────────────────────┐
│                        FASTAPI SERVER (:8000)                        │
│                                                                      │
│    POST /api/jobs         → Create job, enqueue Celery task          │
│    GET  /api/jobs/{id}    → Poll job status + progress %             │
│    GET  /api/reels/{id}   → Full reel data (segments + clips)        │
│    GET  /api/reels/{id}/notes → Aggregated structured notes          │
│    WS   /ws/jobs/{id}     → Real-time progress stream                │
│    GET  /health           → Health check                             │
│    GET  /docs             → Interactive Swagger UI                   │
└───────┬───────────────────────────────┬──────────────────────────────┘
        │                               │
┌───────▼───────┐            ┌──────────▼──────────────────────────────┐
│   SQLite DB   │            │          CELERY WORKER                   │
│               │            │                                          │
│  • jobs       │            │  Stage 1: yt-dlp download (720p)        │
│  • videos     │            │  Stage 2: Whisper transcribe (GPU)      │
│  • segments   │            │  Stage 3: Sentence-boundary segmentation│
│  • clips      │            │  Stage 4: FFmpeg clip extraction        │
│  • summaries  │            │  Stage 5: Groq AI summarization        │
│               │            │  Stage 6: Local / R2 storage            │
└───────────────┘            │  Stage 7: Mark job complete             │
                             └──────────────────┬──────────────────────┘
                                                │
                             ┌──────────────────▼──────────────────────┐
                             │           REDIS (:6379)                  │
                             │     Task broker + result backend         │
                             └─────────────────────────────────────────┘

🚀 Quick Start

Prerequisites

Tool	Install
Python 3.10+	python.org
Docker Desktop	docker.com
FFmpeg	`winget install ffmpeg`
Groq API Key	Free at console.groq.com

1. Clone & Configure

git clone https://github.com/Mohammedsami001/Clario.git
cd Clario/backend
cp .env.example .env

Edit .env and add your GROQ_API_KEY:

GROQ_API_KEY=gsk_your_key_here
WHISPER_MODEL=base
WHISPER_DEVICE=cuda    # Use 'cpu' if no NVIDIA GPU

2. Start Redis

cd Clario
docker compose up redis -d

3. Install Dependencies

Windows:

cd backend
python -m venv venv
.\venv\Scripts\activate

# Install PyTorch (pick one):
pip install torch --index-url https://download.pytorch.org/whl/cu121   # NVIDIA GPU
pip install torch --index-url https://download.pytorch.org/whl/cpu     # CPU only

# Install everything else:
pip install openai-whisper
pip install -r requirements.txt

macOS / Linux:

cd backend
python -m venv venv && source venv/bin/activate
pip install torch openai-whisper
pip install -r requirements.txt

4. Run (2 Terminals)

Terminal 1 — API Server:

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

✅ Look for: Clario API started — tables ready

Terminal 2 — Celery Worker:

celery -A app.tasks.celery_app.celery_app worker --loglevel=info --pool=solo

✅ Look for: celery@... ready.

5. Test

Open test-ui/index.html in your browser → paste a YouTube URL → hit Generate Study Reel → watch the magic happen.

Interactive API docs: http://localhost:8000/docs

📡 API Endpoints

Jobs

Method	Endpoint	Description
`POST`	`/api/jobs`	Submit a YouTube URL → starts the pipeline
`GET`	`/api/jobs/{id}`	Poll job status, stage, and progress (0-100%)
`GET`	`/api/jobs`	List all jobs (most recent first)

Create a job:

curl -X POST http://localhost:8000/api/jobs \
  -H "Content-Type: application/json" \
  -d '{"youtube_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}'

Response:

{
  "id": "3a102508-5121-48b5-9b11-69cded2b8e04",
  "status": "pending",
  "stage": null,
  "progress": 0,
  "youtube_url": "https://www.youtube.com/watch?v=...",
  "created_at": "2026-05-01T10:30:00Z"
}

Reels

Method	Endpoint	Description
`GET`	`/api/reels/{job_id}`	Full reel: video metadata + all segments with clips and summaries
`GET`	`/api/reels/{job_id}/notes`	Aggregated notes for all segments

WebSocket

Protocol	Endpoint	Description
`WS`	`/ws/jobs/{job_id}`	Real-time progress stream (polls every 1.5s)

WebSocket message format:

{
  "job_id": "3a102508-...",
  "status": "processing",
  "stage": "Transcribing audio...",
  "progress": 25,
  "error_msg": null
}

Health

Method	Endpoint	Description
`GET`	`/health`	Service health check

🧠 The AI Pipeline

Stage 1: Download (`pipeline/download.py`)

Tool: yt-dlp (latest, auto-updated)
Resolution: 720p max (quality/speed balance)
Anti-bot: User-Agent spoofing, browser cookie extraction (Chrome → Edge → Firefox), 5 retries
Output: temp/{job_id}/source.mp4

Stage 2: Transcribe (`pipeline/transcribe.py`)

Tool: OpenAI Whisper (base model)
Hardware: NVIDIA GPU (CUDA + fp16) or CPU fallback
Features: Word-level timestamps, VAD for silence detection
Output: Full transcript + word array with {word, start, end}

Stage 3: Segment (`pipeline/segment.py`)

Strategy: Time-based windowing with smart sentence-boundary detection
Target: ~60 second segments (configurable via SEGMENT_DURATION)
Intelligence: Finds nearest sentence break (. ? !) within a ±10 second window — never cuts mid-sentence
Guard rail: Max 20 segments per video (MAX_SEGMENTS)

Stage 4: Clip (`pipeline/clip.py`)

Tool: FFmpeg with stream copy (-c copy) — no re-encoding
Speed: Near-instant extraction (just copies bytes)
Output: temp/{job_id}/clips/clip_000.mp4

Stage 5: Summarize (`pipeline/summarize.py`)

Tool: Groq API with llama3-8b-8192
Output format: Structured JSON with title, summary, bullets[], tags[]
Temperature: 0.3 (deterministic, factual output)
Fallback: If Groq fails, extracts first sentence as summary
Rate limit: 14,400 requests/day (free tier) — supports 700+ videos/day

Stage 6: Store (`pipeline/storage.py`)

Dev mode: Copies clips to media/clips/ → served via FastAPI static files
Prod mode: Uploads to Cloudflare R2 (S3-compatible) → returns public URL
Cleanup: Deletes temp files after processing to save disk space

📂 Project Structure

Clario/
├── backend/
│   ├── app/
│   │   ├── main.py                 # FastAPI entry point, CORS, static files
│   │   ├── api/
│   │   │   ├── jobs.py             # Job CRUD endpoints
│   │   │   ├── reels.py            # Reel & notes retrieval
│   │   │   └── ws.py               # WebSocket live progress
│   │   ├── core/
│   │   │   ├── config.py           # Pydantic settings (from .env)
│   │   │   └── database.py         # SQLAlchemy engine & session
│   │   ├── models/
│   │   │   └── models.py           # ORM: Job, Video, Segment, Clip, Summary
│   │   ├── pipeline/
│   │   │   ├── download.py         # yt-dlp with anti-403 hardening
│   │   │   ├── transcribe.py       # OpenAI Whisper (GPU/CPU)
│   │   │   ├── segment.py          # Sentence-boundary segmentation
│   │   │   ├── clip.py             # FFmpeg stream-copy extraction
│   │   │   ├── summarize.py        # Groq LLaMA 3 summarization
│   │   │   └── storage.py          # Local disk / Cloudflare R2
│   │   └── tasks/
│   │       ├── celery_app.py       # Celery configuration
│   │       └── process_video.py    # Full pipeline orchestration
│   ├── requirements.txt
│   ├── .env.example
│   ├── setup.bat                   # Windows one-click setup
│   ├── start_api.bat               # Start FastAPI server
│   └── start_worker.bat            # Start Celery worker
│
├── test-ui/
│   └── index.html                  # Dark-mode test interface
│
├── docker-compose.yml              # Redis service
├── push_to_github.bat              # Git push automation
├── .gitignore
└── README.md

🗄️ Database Schema

jobs         →  tracks pipeline status, progress %, error messages
videos       →  source metadata (title, duration, transcript, YouTube URL)
segments     →  concept chunks (index, title, start/end time, transcript)
clips        →  video files (local path or R2 URL, duration)
summaries    →  AI output (one-liner, bullet points, topic tags)

Relationships:

Job  1──1  Video  1──*  Segment  1──1  Clip  1──1  Summary

⚙️ Configuration

All settings via .env — see .env.example for defaults:

Variable	Default	Description
`DATABASE_URL`	`sqlite:///./clario.db`	Database connection string
`REDIS_URL`	`redis://localhost:6379/0`	Celery broker URL
`GROQ_API_KEY`	—	Required. Get free at console.groq.com
`WHISPER_MODEL`	`base`	Whisper model size (`tiny`, `base`, `small`, `medium`, `large`)
`WHISPER_DEVICE`	`cuda`	`cuda` for NVIDIA GPU, `cpu` for CPU-only
`SEGMENT_DURATION`	`60`	Target segment length in seconds
`MAX_SEGMENTS`	`20`	Maximum segments per video
`BASE_URL`	`http://localhost:8000`	API base URL (for local clip URLs)

🧪 Test UI

The included test-ui/index.html provides a complete dark-mode test interface with:

🔗 YouTube URL input
📊 Real-time progress bar via WebSocket
🏷️ Status badges (pending → processing → done/failed)
🎬 Video player for each generated clip
📌 AI-generated titles, summaries, and bullet points
🏷️ Topic tags
📝 Full notes view with export capability

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feat/amazing-feature)
Commit your changes (git commit -m 'feat: add amazing feature')
Push to the branch (git push origin feat/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.

Built with Precision by SAMI

Clario MVP

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
backend		backend
test-ui		test-ui
.gitignore		.gitignore
README.md		README.md
RUN_COMMANDS.txt		RUN_COMMANDS.txt
docker-compose.yml		docker-compose.yml
implementation_plan.md		implementation_plan.md
push_to_github.bat		push_to_github.bat

Folders and files

Latest commit

History

Repository files navigation

⚡ Clario

Turn any YouTube lecture into a scrollable study reel.

🎯 The Problem

✨ What Clario Does

🛠️ Tech Stack

💰 Total Running Cost: $0/month

🏗️ System Architecture

🚀 Quick Start

Prerequisites

1. Clone & Configure

2. Start Redis

3. Install Dependencies

4. Run (2 Terminals)

5. Test

📡 API Endpoints

Jobs

Reels

WebSocket

Health

🧠 The AI Pipeline

Stage 1: Download (pipeline/download.py)

Stage 2: Transcribe (pipeline/transcribe.py)

Stage 3: Segment (pipeline/segment.py)

Stage 4: Clip (pipeline/clip.py)

Stage 5: Summarize (pipeline/summarize.py)

Stage 6: Store (pipeline/storage.py)

📂 Project Structure

🗄️ Database Schema

⚙️ Configuration

🧪 Test UI

🤝 Contributing

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Stage 1: Download (`pipeline/download.py`)

Stage 2: Transcribe (`pipeline/transcribe.py`)

Stage 3: Segment (`pipeline/segment.py`)

Stage 4: Clip (`pipeline/clip.py`)

Stage 5: Summarize (`pipeline/summarize.py`)

Stage 6: Store (`pipeline/storage.py`)

Packages