Skip to content

theakashrai/ai-video-generator-workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AI Video Generation Workflow

A self-hosted AI video generation pipeline that creates viral-style content featuring animated babies, animals, and characters. This system replicates trending formats like "baby podcasters" and "animal CEO meetings" using a complete local AI pipeline.

🎯 Features

  • Complete Local Pipeline: Generate videos offline using self-hosted AI models
  • Multiple Character Types: Babies, animals, celebrities, and cartoon characters
  • Advanced Voice Synthesis: Baby voices, celebrity impersonations, and emotional variations
  • Professional Lip-Sync: Natural facial animation synchronized with audio
  • Template System: Pre-built viral content templates
  • Real-Time Progress: WebSocket-based generation monitoring
  • Modern UI: React frontend with drag-and-drop and real-time previews

πŸ—οΈ Architecture

Backend (Python)

  • FastAPI server with async support
  • Pydantic models for type safety
  • Single responsibility services
  • ComfyUI integration for Stable Diffusion
  • Ollama for local LLM inference
  • Multiple TTS engines (MeloTTS, FishSpeech, F5-TTS)

Frontend (React + TypeScript)

  • Vite for fast development
  • Tailwind CSS for styling
  • React Query for API state management
  • Zustand for client state
  • Framer Motion for animations

πŸš€ Quick Start

Prerequisites

  • Python 3.9+ with pip/uv
  • Node.js 18+ with npm
  • NVIDIA GPU (8GB+ VRAM recommended)
  • CUDA 11.8+ for GPU acceleration

Backend Setup

# Clone repository
git clone <repository-url>
cd ai-video-generation-workflow

# Install Python dependencies
cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e ".[dev,gpu,models]"

# Set up environment
cp .env.example .env
# Edit .env with your configuration

# Run development server
python src/main.py

Frontend Setup

# Install Node dependencies
cd frontend
npm install

# Start development server
npm run dev

Access the Application

πŸ“‹ Requirements

Hardware Requirements

Component Minimum Recommended Optimal
GPU VRAM 8GB 12GB 24GB+
System RAM 16GB 32GB 64GB+
Storage 100GB 200GB 500GB+
CPU Cores 6 8 12+

AI Models

The system automatically downloads required models:

  • LLM: Llama 3.1-8B (4.6GB)
  • TTS: MeloTTS (500MB)
  • Image: Stable Diffusion v1.5 (4GB)
  • Animation: LatentSync (3.2GB)

πŸ› οΈ Development

VS Code Setup

The repository includes complete VS Code configuration:

  • Launch configurations for debugging backend and frontend
  • Workspace settings with Python and TypeScript formatting
  • Extension recommendations for optimal development experience

Code Quality

  • Python: Black, Ruff, MyPy for formatting and linting
  • TypeScript: ESLint, Prettier for code quality
  • Testing: Pytest (backend), Vitest (frontend)
  • Pre-commit hooks for consistent code style

Development Workflow

# Backend development
cd backend
python src/main.py  # Start dev server with hot reload

# Frontend development
cd frontend
npm run dev  # Start with hot reload

# Run tests
cd backend && python -m pytest
cd frontend && npm test

# Code formatting
cd backend && black . && ruff check .
cd frontend && npm run format

πŸ“– Usage

Basic Video Generation

  1. Choose Character Type: Select from baby humans, animals, celebrities, or cartoon characters
  2. Write Script: Create engaging prompt or use trending templates
  3. Configure Voice: Select voice style matching your character
  4. Generate Video: Watch real-time progress as AI creates your video
  5. Download & Share: Get your viral-ready MP4 file

Advanced Features

  • Custom Characters: Upload reference images for personalized avatars
  • Voice Cloning: Use sample audio for custom voice generation
  • Batch Processing: Generate multiple variations simultaneously
  • Template Creation: Save successful configurations for reuse

πŸ”§ Configuration

Environment Variables

# Server settings
HOST=0.0.0.0
PORT=8000
DEBUG=false

# Model settings
MODELS_DIR=./models
GPU_ENABLED=true
MAX_CONCURRENT_TASKS=2

# Model selection
LLM_MODEL=llama3.1:8b
TTS_MODEL=melotts
IMAGE_MODEL=stable-diffusion-v1-5

Hardware Optimization

# GPU memory management
TORCH_CUDA_ALLOC_CONF=expandable_segments:True

# Performance tuning
MAX_VIDEO_DURATION=300
API_RATE_LIMIT=100

πŸš€ Production Deployment

Docker Deployment (Recommended)

# Build and run with Docker Compose
docker-compose up -d

# Or build manually
docker build -t ai-video-gen .
docker run -p 8000:8000 --gpus all ai-video-gen

Manual Deployment

# Backend production
pip install -e ".[production]"
gunicorn src.main:app --host 0.0.0.0 --port 8000

# Frontend production build
npm run build
# Serve dist/ with nginx or similar

πŸ“Š Performance

Generation Times (RTX 4070)

  • Script Generation: ~10 seconds
  • Audio Synthesis: ~15 seconds
  • Image Generation: ~30 seconds
  • Lip-Sync Animation: ~60 seconds
  • Video Rendering: ~20 seconds
  • Total: ~2-3 minutes for 30-second video

Optimization Tips

  • Enable GPU acceleration for 5-10x speedup
  • Use quantized models to reduce VRAM usage
  • Batch process multiple videos for efficiency
  • SSD storage improves model loading times

🀝 Contributing

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • ComfyUI for Stable Diffusion integration
  • Ollama for local LLM inference
  • MeloTTS for high-quality text-to-speech
  • HunyuanVideo for lip-sync animation
  • FastAPI and React communities

πŸ“ž Support


Made with ❀️ for the AI content creation community# Test automation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages