CityWok - Audio Fingerprinting Episode Identifier

CityWokDemo.mov

A production-ready, full-stack web application that identifies TV show episodes from short audio/video clips using advanced audio fingerprinting technology. Built with modern Python and React, featuring enterprise-grade performance optimizations and security.

🎯 Project Overview

CityWok is a sophisticated audio recognition system that can identify specific episodes and timestamps from short clips (as brief as 3-5 seconds), even when audio is distorted, compressed, or filtered. The system processes audio from TikTok videos, YouTube clips, Instagram reels, or direct file uploads and matches them against a comprehensive database of 20 seasons.

✨ Key Technical Achievements

Advanced Audio Processing

Custom Audio Fingerprinting Algorithm: Implemented a spectrogram-based peak detection system using librosa and NumPy for robust audio analysis
Optimized Matching Engine: Multi-stage incremental matching with early exit optimizations for sub-second response times
Parallel Processing: ThreadPoolExecutor-based concurrent season matching for optimal performance
Memory-Efficient Architecture: Handles 6.5GB+ database with intelligent lazy loading and memory management

Full-Stack Development

Backend: FastAPI REST API with async/await patterns, Server-Sent Events (SSE) for real-time progress updates
Frontend: Modern React application with responsive design, drag-and-drop file uploads, and streaming progress indicators
Cloud Infrastructure: Containerized deployment on Railway with Cloudflare R2 storage integration

Production-Ready Features

Security: Web Application Firewall (WAF), security headers (CSP, HSTS), IP-based rate limiting, API key authentication
Scalability: Supports 1000+ daily users with configurable rate limits (10 req/hr free tier, 100 req/hr authenticated)
Error Handling: Comprehensive error handling with user-friendly messages and graceful degradation
Monitoring: Sentry integration for production error tracking and performance monitoring

🛠️ Technology Stack

Backend

Framework: FastAPI (async Python web framework)
Audio Processing: librosa, NumPy, SciPy for signal processing and spectrogram analysis
Data Storage: Pickle-based fingerprint databases stored in Cloudflare R2 (S3-compatible)
Video Processing: yt-dlp for multi-platform video download and audio extraction
Deployment: Docker containerization, Railway hosting with persistent volumes

Frontend

Framework: React 18 with Vite build tool
Styling: Custom CSS with responsive design principles
Deployment: Vercel with environment-based configuration
Analytics: Vercel Analytics integration

Infrastructure

Backend Hosting: Railway (containerized deployment)
Storage: Cloudflare R2 (object storage for database files)
Frontend Hosting: Vercel (edge network deployment)
Monitoring: Sentry for error tracking

🏗️ Architecture Highlights

Audio Fingerprinting Pipeline

Audio Extraction: Downloads videos from URLs (TikTok, YouTube, Instagram) or processes uploaded files
Preprocessing: Converts to 22kHz mono, extracts spectrogram using Short-Time Fourier Transform (STFT)
Peak Detection: Identifies significant frequency peaks using maximum filter algorithms
Hash Generation: Creates unique fingerprints using MD5 hashing of peak constellations
Database Matching: Parallel search across 20 seasons using optimized numpy arrays
Result Ranking: Confidence scoring with multiple quality metrics (alignment, margin, sharpness)

Performance Optimizations

Incremental Matching: Processes audio in stages (3s, 5s, 8s, 10s) with early exit on confident matches
Multi-Window Sampling: Extracts and matches multiple time windows for robust distorted clip handling
IDF Weighting: Down-weights common audio patterns to improve match accuracy
Posting List Capping: Limits per-hash matches to prevent noise from overpowering signals
Lazy Loading: On-demand database loading to optimize memory usage

Data Management

Database Format: Efficient numpy array storage with structured dtypes for minimal memory footprint
Season-Based Organization: Separate databases per season for parallel processing
R2 Integration: Automated download scripts for cloud-stored databases
Volume Persistence: Railway volumes for persistent storage across deployments

🚀 Deployment

The application is fully containerized and deployed on modern cloud platforms:

Backend: Railway with Docker, persistent volumes, and environment-based configuration
Frontend: Vercel with automatic deployments from Git
Storage: Cloudflare R2 for cost-effective object storage

🎓 Known Limitations and Potential Improvements

Current Limitations

Distorted Audio Accuracy: Heavily distorted, filtered, or compressed audio clips may produce inaccurate matches. The current audio-only fingerprinting approach struggles with extreme audio degradation.
Multi-Clip Videos: TikTok videos containing multiple episode clips are currently processed as a single audio stream, resulting in only one episode guess for the entire video rather than per-clip identification.
Dataset Quality: The audio fingerprint database was generated from source episodes, and approximately ~3% of episodes may have misaligned or flipped episode numbers. Some episodes may be off by 1 episode number. This is due to dataset preparation constraints and manual verification was not feasible. If you notice incorrect episode matches, please reach out to report them.

Planned Enhancements

Hybrid Audio-Visual Pipeline (2-Pass System):
- Pass 1: Audio fingerprinting (current system)
- Pass 2: Visual frame analysis using computer vision
- Fusion: Combine audio and visual confidence scores for improved accuracy on distorted audio
- Use Case: Better handling of heavily filtered, compressed, or distorted TikTok clips
Multi-Clip Detection and Segmentation:
- Scene Detection: Integrate pyscenedetect to identify scene cuts and transitions in multi-clip videos
- Per-Clip Matching: Process each detected clip segment independently
- Timeline Display: Show a video timeline with episode guesses for each segment
- Use Case: TikTok compilations, reaction videos, or videos with multiple South Park clips

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
railway.json		railway.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CityWok - Audio Fingerprinting Episode Identifier

🎯 Project Overview

✨ Key Technical Achievements

Advanced Audio Processing

Full-Stack Development

Production-Ready Features

🛠️ Technology Stack

Backend

Frontend

Infrastructure

🏗️ Architecture Highlights

Audio Fingerprinting Pipeline

Performance Optimizations

Data Management

🚀 Deployment

🎓 Known Limitations and Potential Improvements

Current Limitations

Planned Enhancements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CityWok - Audio Fingerprinting Episode Identifier

🎯 Project Overview

✨ Key Technical Achievements

Advanced Audio Processing

Full-Stack Development

Production-Ready Features

🛠️ Technology Stack

Backend

Frontend

Infrastructure

🏗️ Architecture Highlights

Audio Fingerprinting Pipeline

Performance Optimizations

Data Management

🚀 Deployment

🎓 Known Limitations and Potential Improvements

Current Limitations

Planned Enhancements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages