Skip to content

GallonShih/youtube-live-chat-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

311 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Project Logo

YouTube Live Chat Analyzer

YouTube Live Stream Chat Collection & Analysis System

Docker Python FastAPI React PostgreSQL APScheduler License


πŸͺ½ What is YouTube Live Chat Analyzer?

YouTube Live Chat Analyzer is a complete data pipeline for collecting, processing, and visualizing YouTube live stream chat messages in real-time.

The system captures chat messages from live streams, processes them through NLP pipelines (Chinese tokenization, emoji extraction), and uses Gemini AI to automatically discover new slang, memes, and typos from the community.

Playback Demo


✨ Features

Feature Description
πŸ“₯ Real-time Collection Capture live chat messages using chat-downloader with automatic retry & reconnection
πŸ”„ ETL Processing Chinese tokenization with Jieba, emoji extraction, word replacement pipelines
πŸ€– AI-Powered Discovery Gemini API (gemini-2.5-flash-lite) analyzes chat to discover new memes, slang, and typos automatically
πŸ“Š Interactive Dashboard React-based dashboard with word cloud, playback timeline, and admin management
πŸ“ˆ Word Trend Analysis Track specific word usage trends over time with customizable word groups
πŸ› οΈ Admin Panel Approve/reject AI-discovered words, manage dictionaries, configure settings
πŸ” Role-based Access JWT-based authentication with Admin/Guest roles for secure admin operations

πŸ“Έ Gallery

⚑ Real-time Analytics ☁️ Word Cloud
🀣 Emoji Statistics πŸ› οΈ Admin Text Mining

πŸ—οΈ Architecture

Architecture


πŸš€ Quick Start

Prerequisites

  • Docker & Docker Compose
  • YouTube Data API Key
  • Gemini API Key (for AI word discovery)

Setup

# 1. Clone the repository
git clone https://github.com/GallonShih/youtube-live-chat-analyzer.git
cd youtube-live-chat-analyzer

# 2. Configure environment variables
cp .env.example .env
# Edit .env and set:
# - YOUTUBE_API_KEY: Your YouTube Data API Key
# - GEMINI_API_KEY: Your Google AI API Key (for discovery DAG)
# - YOUTUBE_URL: The full URL (or ID) of the live stream you want to track
# - ADMIN_PASSWORD: Password for admin access
# - JWT_SECRET_KEY: Secure random key for JWT tokens

# 3. Generate a secure JWT secret key
python3 -c "import secrets; print(secrets.token_hex(32))"
# Copy the output to JWT_SECRET_KEY in .env

# 4. Start all services
docker-compose up -d

# 5. Access the dashboard
open http://localhost:3000

πŸ“– First-time setup? See docs/SETUP.md for detailed configuration. Important: You must manually trigger the Import Dictionary task in the Admin Panel after deployment to enable proper text analysis.


πŸ”Œ Services

Service Port Description
Dashboard Frontend 3000 React-based visualization & admin UI
Dashboard Backend 8000 FastAPI REST API with built-in ETL scheduler (/docs for Swagger)
PostgreSQL 5432 Primary data storage
pgAdmin 5050 Database administration UI

πŸ“ Project Structure

youtube-live-chat-analyzer/
β”œβ”€β”€ collector/           # Chat collection service (Python)
β”‚   β”œβ”€β”€ main.py          # Entry point: coordinates collection & stats polling
β”‚   β”œβ”€β”€ chat_collector.py# Real-time chat message collection
β”‚   └── youtube_api.py   # YouTube Data API integration
β”‚
β”œβ”€β”€ dashboard/
β”‚   β”œβ”€β”€ backend/         # FastAPI REST API with built-in ETL scheduler
β”‚   β”‚   β”œβ”€β”€ app/routers/ # API endpoints (chat, wordcloud, admin, etl, etc.)
β”‚   β”‚   β”œβ”€β”€ app/etl/     # APScheduler-based ETL tasks
β”‚   β”‚   β”‚   β”œβ”€β”€ processors/  # Chat processing, word discovery, dict import
β”‚   β”‚   β”‚   β”œβ”€β”€ scheduler.py # Task scheduling
β”‚   β”‚   β”‚   └── tasks.py     # Task definitions
β”‚   β”‚   └── app/models.py    # SQLAlchemy models
β”‚   └── frontend/        # React + Vite + TailwindCSS
β”‚       └── src/features/    # Feature-based components
β”‚           β”œβ”€β”€ admin/       # Admin panel (ETL jobs, settings, word approval)
β”‚           β”œβ”€β”€ playback/    # Timeline-based message playback
β”‚           └── trends/      # Word trends analysis UI
β”‚
β”œβ”€β”€ airflow/             # [DEPRECATED] Legacy Airflow DAGs (see docs/legacy/)
β”‚
β”œβ”€β”€ database/
β”‚   └── init/            # SQL migrations (auto-executed on first start)
β”‚
β”œβ”€β”€ text_analysis/       # NLP dictionaries (stopwords, special words, etc.)
β”‚
β”œβ”€β”€ .github/workflows/   # CI/CD pipeline (tests, build, deploy)
β”‚
β”œβ”€β”€ docker-compose.yml   # Full stack orchestration
β”œβ”€β”€ .env.example         # Environment variables template
└── CLAUDE.md            # AI agent development guide

πŸ› οΈ Development

For detailed development commands and guidelines, see CLAUDE.md.

# View logs
docker-compose logs -f collector

# Rebuild a specific service
docker-compose up -d --build dashboard-backend

# Access database
docker-compose exec postgres psql -U hermes -d hermes

Frontend Unit Tests (Local, no Docker required)

Frontend tests use Vitest + React Testing Library + MSW and run directly on local Node.js.

cd dashboard/frontend

# Install dependencies
npm install

# Run tests once
npm run test:run

# Or watch mode during development
npm run test:watch

Current test focus includes:

  • Author detail UI rendering (avatar/badges)
  • Message rendering (emoji + paid amount)
  • Error handling for author detail API
  • Dashboard author drawer interaction flow

πŸ“„ License

MIT License - see LICENSE for details.

About

YouTube Live Chat Analytics & AI-powered Meme Discovery | Real-time chat collection, NLP processing, word cloud visualization

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors