YouTube Live Stream Chat Collection & Analysis System
YouTube Live Chat Analyzer is a complete data pipeline for collecting, processing, and visualizing YouTube live stream chat messages in real-time.
The system captures chat messages from live streams, processes them through NLP pipelines (Chinese tokenization, emoji extraction), and uses Gemini AI to automatically discover new slang, memes, and typos from the community.
| Feature | Description |
|---|---|
| π₯ Real-time Collection | Capture live chat messages using chat-downloader with automatic retry & reconnection |
| π ETL Processing | Chinese tokenization with Jieba, emoji extraction, word replacement pipelines |
| π€ AI-Powered Discovery | Gemini API (gemini-2.5-flash-lite) analyzes chat to discover new memes, slang, and typos automatically |
| π Interactive Dashboard | React-based dashboard with word cloud, playback timeline, and admin management |
| π Word Trend Analysis | Track specific word usage trends over time with customizable word groups |
| π οΈ Admin Panel | Approve/reject AI-discovered words, manage dictionaries, configure settings |
| π Role-based Access | JWT-based authentication with Admin/Guest roles for secure admin operations |
| β‘ Real-time Analytics | βοΈ Word Cloud |
|---|---|
![]() |
![]() |
| π€£ Emoji Statistics | π οΈ Admin Text Mining |
|---|---|
![]() |
![]() |
- Docker & Docker Compose
- YouTube Data API Key
- Gemini API Key (for AI word discovery)
# 1. Clone the repository
git clone https://github.com/GallonShih/youtube-live-chat-analyzer.git
cd youtube-live-chat-analyzer
# 2. Configure environment variables
cp .env.example .env
# Edit .env and set:
# - YOUTUBE_API_KEY: Your YouTube Data API Key
# - GEMINI_API_KEY: Your Google AI API Key (for discovery DAG)
# - YOUTUBE_URL: The full URL (or ID) of the live stream you want to track
# - ADMIN_PASSWORD: Password for admin access
# - JWT_SECRET_KEY: Secure random key for JWT tokens
# 3. Generate a secure JWT secret key
python3 -c "import secrets; print(secrets.token_hex(32))"
# Copy the output to JWT_SECRET_KEY in .env
# 4. Start all services
docker-compose up -d
# 5. Access the dashboard
open http://localhost:3000π First-time setup? See docs/SETUP.md for detailed configuration. Important: You must manually trigger the Import Dictionary task in the Admin Panel after deployment to enable proper text analysis.
| Service | Port | Description |
|---|---|---|
| Dashboard Frontend | 3000 |
React-based visualization & admin UI |
| Dashboard Backend | 8000 |
FastAPI REST API with built-in ETL scheduler (/docs for Swagger) |
| PostgreSQL | 5432 |
Primary data storage |
| pgAdmin | 5050 |
Database administration UI |
youtube-live-chat-analyzer/
βββ collector/ # Chat collection service (Python)
β βββ main.py # Entry point: coordinates collection & stats polling
β βββ chat_collector.py# Real-time chat message collection
β βββ youtube_api.py # YouTube Data API integration
β
βββ dashboard/
β βββ backend/ # FastAPI REST API with built-in ETL scheduler
β β βββ app/routers/ # API endpoints (chat, wordcloud, admin, etl, etc.)
β β βββ app/etl/ # APScheduler-based ETL tasks
β β β βββ processors/ # Chat processing, word discovery, dict import
β β β βββ scheduler.py # Task scheduling
β β β βββ tasks.py # Task definitions
β β βββ app/models.py # SQLAlchemy models
β βββ frontend/ # React + Vite + TailwindCSS
β βββ src/features/ # Feature-based components
β βββ admin/ # Admin panel (ETL jobs, settings, word approval)
β βββ playback/ # Timeline-based message playback
β βββ trends/ # Word trends analysis UI
β
βββ airflow/ # [DEPRECATED] Legacy Airflow DAGs (see docs/legacy/)
β
βββ database/
β βββ init/ # SQL migrations (auto-executed on first start)
β
βββ text_analysis/ # NLP dictionaries (stopwords, special words, etc.)
β
βββ .github/workflows/ # CI/CD pipeline (tests, build, deploy)
β
βββ docker-compose.yml # Full stack orchestration
βββ .env.example # Environment variables template
βββ CLAUDE.md # AI agent development guide
For detailed development commands and guidelines, see CLAUDE.md.
# View logs
docker-compose logs -f collector
# Rebuild a specific service
docker-compose up -d --build dashboard-backend
# Access database
docker-compose exec postgres psql -U hermes -d hermesFrontend tests use Vitest + React Testing Library + MSW and run directly on local Node.js.
cd dashboard/frontend
# Install dependencies
npm install
# Run tests once
npm run test:run
# Or watch mode during development
npm run test:watchCurrent test focus includes:
- Author detail UI rendering (avatar/badges)
- Message rendering (emoji + paid amount)
- Error handling for author detail API
- Dashboard author drawer interaction flow
MIT License - see LICENSE for details.





