AI Video Generation Workflow

A self-hosted AI video generation pipeline that creates viral-style content featuring animated babies, animals, and characters. This system replicates trending formats like "baby podcasters" and "animal CEO meetings" using a complete local AI pipeline.

🎯 Features

Complete Local Pipeline: Generate videos offline using self-hosted AI models
Multiple Character Types: Babies, animals, celebrities, and cartoon characters
Advanced Voice Synthesis: Baby voices, celebrity impersonations, and emotional variations
Professional Lip-Sync: Natural facial animation synchronized with audio
Template System: Pre-built viral content templates
Real-Time Progress: WebSocket-based generation monitoring
Modern UI: React frontend with drag-and-drop and real-time previews

🏗️ Architecture

Backend (Python)

FastAPI server with async support
Pydantic models for type safety
Single responsibility services
ComfyUI integration for Stable Diffusion
Ollama for local LLM inference
Multiple TTS engines (MeloTTS, FishSpeech, F5-TTS)

Frontend (React + TypeScript)

Vite for fast development
Tailwind CSS for styling
React Query for API state management
Zustand for client state
Framer Motion for animations

🚀 Quick Start

Prerequisites

Python 3.9+ with pip/uv
Node.js 18+ with npm
NVIDIA GPU (8GB+ VRAM recommended)
CUDA 11.8+ for GPU acceleration

Backend Setup

# Clone repository
git clone <repository-url>
cd ai-video-generation-workflow

# Install Python dependencies
cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e ".[dev,gpu,models]"

# Set up environment
cp .env.example .env
# Edit .env with your configuration

# Run development server
python src/main.py

Frontend Setup

# Install Node dependencies
cd frontend
npm install

# Start development server
npm run dev

Access the Application

Frontend: http://localhost:3000
Backend API: http://localhost:8000
API Docs: http://localhost:8000/api/docs

📋 Requirements

Hardware Requirements

Component	Minimum	Recommended	Optimal
GPU VRAM	8GB	12GB	24GB+
System RAM	16GB	32GB	64GB+
Storage	100GB	200GB	500GB+
CPU Cores	6	8	12+

AI Models

The system automatically downloads required models:

LLM: Llama 3.1-8B (4.6GB)
TTS: MeloTTS (500MB)
Image: Stable Diffusion v1.5 (4GB)
Animation: LatentSync (3.2GB)

🛠️ Development

VS Code Setup

The repository includes complete VS Code configuration:

Launch configurations for debugging backend and frontend
Workspace settings with Python and TypeScript formatting
Extension recommendations for optimal development experience

Code Quality

Python: Black, Ruff, MyPy for formatting and linting
TypeScript: ESLint, Prettier for code quality
Testing: Pytest (backend), Vitest (frontend)
Pre-commit hooks for consistent code style

Development Workflow

# Backend development
cd backend
python src/main.py  # Start dev server with hot reload

# Frontend development
cd frontend
npm run dev  # Start with hot reload

# Run tests
cd backend && python -m pytest
cd frontend && npm test

# Code formatting
cd backend && black . && ruff check .
cd frontend && npm run format

📖 Usage

Basic Video Generation

Choose Character Type: Select from baby humans, animals, celebrities, or cartoon characters
Write Script: Create engaging prompt or use trending templates
Configure Voice: Select voice style matching your character
Generate Video: Watch real-time progress as AI creates your video
Download & Share: Get your viral-ready MP4 file

Advanced Features

Custom Characters: Upload reference images for personalized avatars
Voice Cloning: Use sample audio for custom voice generation
Batch Processing: Generate multiple variations simultaneously
Template Creation: Save successful configurations for reuse

🔧 Configuration

Environment Variables

# Server settings
HOST=0.0.0.0
PORT=8000
DEBUG=false

# Model settings
MODELS_DIR=./models
GPU_ENABLED=true
MAX_CONCURRENT_TASKS=2

# Model selection
LLM_MODEL=llama3.1:8b
TTS_MODEL=melotts
IMAGE_MODEL=stable-diffusion-v1-5

Hardware Optimization

# GPU memory management
TORCH_CUDA_ALLOC_CONF=expandable_segments:True

# Performance tuning
MAX_VIDEO_DURATION=300
API_RATE_LIMIT=100

🚀 Production Deployment

Docker Deployment (Recommended)

# Build and run with Docker Compose
docker-compose up -d

# Or build manually
docker build -t ai-video-gen .
docker run -p 8000:8000 --gpus all ai-video-gen

Manual Deployment

# Backend production
pip install -e ".[production]"
gunicorn src.main:app --host 0.0.0.0 --port 8000

# Frontend production build
npm run build
# Serve dist/ with nginx or similar

📊 Performance

Generation Times (RTX 4070)

Script Generation: ~10 seconds
Audio Synthesis: ~15 seconds
Image Generation: ~30 seconds
Lip-Sync Animation: ~60 seconds
Video Rendering: ~20 seconds
Total: ~2-3 minutes for 30-second video

Optimization Tips

Enable GPU acceleration for 5-10x speedup
Use quantized models to reduce VRAM usage
Batch process multiple videos for efficiency
SSD storage improves model loading times

🤝 Contributing

Fork the repository
Create feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

ComfyUI for Stable Diffusion integration
Ollama for local LLM inference
MeloTTS for high-quality text-to-speech
HunyuanVideo for lip-sync animation
FastAPI and React communities

📞 Support

Documentation: docs/
Issues: GitHub Issues
Discussions: GitHub Discussions

Made with ❤️ for the AI content creation community# Test automation

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
backend		backend
docs		docs
scripts		scripts
.gitignore		.gitignore
.jscpd.json		.jscpd.json
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
poetry.toml		poetry.toml
pytest.ini		pytest.ini

Folders and files

Latest commit

History

Repository files navigation

AI Video Generation Workflow

🎯 Features

🏗️ Architecture

Backend (Python)

Frontend (React + TypeScript)

🚀 Quick Start

Prerequisites

Backend Setup

Frontend Setup

Access the Application

📋 Requirements

Hardware Requirements

AI Models

🛠️ Development

VS Code Setup

Code Quality

Development Workflow

📖 Usage

Basic Video Generation

Advanced Features

🔧 Configuration

Environment Variables

Hardware Optimization

🚀 Production Deployment

Docker Deployment (Recommended)

Manual Deployment

📊 Performance

Generation Times (RTX 4070)

Optimization Tips

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages