AudioGhost AI 🎵👻

AI-Powered Object-Oriented Audio Separation

Describe the sound you want to extract or remove using natural language. Powered by Meta's SAM-Audio model.

🎬 Demo

Audio Separation

audioghost.mp4

Video Upload

audioghost_video.mp4

Features

🎯 Text-Guided Separation - Describe what you want to extract: "vocals", "drums", "a dog barking"
🎬 Video Upload Support - Upload videos and extract/remove audio sources (audio extraction only, not vision-based)
🚀 Memory Optimized - Lite mode reduces VRAM from ~11GB to ~4GB
🎨 Modern UI - Glassmorphism design with waveform visualization
⚡ Real-time Progress - Track separation progress in real-time
🎛️ Stem Mixer - Preview and compare original, extracted, and residual audio

🗺️ Roadmap

🖱️ Visual Prompting - Click on video to select sound sources visually (Integration with SAM 2)

Architecture

┌─────────────────────────────────────────────────┐
│                   Frontend                       │
│             (Next.js + Tailwind v4)             │
└──────────────────────┬──────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────┐
│               Backend API                        │
│            (FastAPI + Python)                    │
└──────────────────────┬──────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────┐
│              Task Queue                          │
│          (Celery + Redis)                        │
└──────────────────────┬──────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────┐
│           SAM Audio Lite                         │
│    (Memory-optimized Meta SAM-Audio)            │
└─────────────────────────────────────────────────┘

Requirements

Python 3.11+
CUDA-compatible GPU (4GB+ VRAM for lite mode, 12GB+ for full mode)
CUDA 12.6 (recommended)
Node.js 18+ (for frontend)

💡 FFmpeg and Redis are automatically installed by the installer.

🚀 One-Click Installation (Recommended)

First Time Setup

# Run installer (creates Conda env, downloads Redis, installs all dependencies)
install.bat

Daily Usage

# Start all services with one click
start.bat

# Stop all services
stop.bat

Manual Setup (Advanced)

1. Start Redis

Redis is automatically downloaded to redis/ folder by install.bat. If you prefer Docker:

docker-compose up -d

2. Create Anaconda Environment

# Create new environment (Python 3.11+ required)
conda create -n audioghost python=3.11 -y

# Activate environment
conda activate audioghost

3. Install PyTorch (CUDA 12.6)

pip install torch==2.9.0+cu126 torchvision==0.24.0+cu126 torchaudio==2.9.0+cu126 --index-url https://download.pytorch.org/whl/cu126 --extra-index-url https://pypi.org/simple

4. Install FFmpeg (required by TorchCodec)

conda install -c conda-forge ffmpeg -y

5. Install SAM Audio

pip install git+https://github.com/facebookresearch/sam-audio.git

6. Install Backend Dependencies

cd backend
pip install -r requirements.txt

7. Install Frontend Dependencies

cd frontend
npm install

8. Start Services

Terminal 1 - Backend API:

cd backend
uvicorn main:app --reload --port 8000

Terminal 2 - Celery Worker:

conda activate audioghost
cd backend
celery -A workers.celery_app worker --loglevel=info --pool=solo

Terminal 3 - Frontend:

cd frontend
npm run dev

9. Open the App

Navigate to http://localhost:3000

10. Connect HuggingFace

Click "Connect HuggingFace" button
Request access at https://huggingface.co/facebook/sam-audio-large
Create Access Token: https://huggingface.co/settings/tokens
Paste the token and connect

Usage

Upload an audio file (MP3, WAV, FLAC)
Describe what you want to extract or remove:
- "vocals" / "singing voice"
- "drums" / "percussion"
- "background music"
- "a dog barking"
- "crowd noise"
Click Extract or Remove
Wait for processing
Preview and download the results

Performance Benchmarks

Tested on RTX 4090 with 4:26 audio (11 chunks @ 25s each)

VRAM Usage (Lite Mode)

Model	bfloat16 (Default)	float32 (High Quality)	Recommended GPU
Small	~6 GB	~10 GB	RTX 3060 6GB / RTX 3070 8GB
Base	~7 GB	~13 GB	RTX 3070/4060 8GB / RTX 4070 12GB
Large	~10 GB	~20 GB	RTX 3080/4070 12GB / RTX 4080 16GB

💡 High Quality Mode (float32): Better separation quality but uses +2-3GB more VRAM. Enable via the "High Quality Mode" toggle in the UI.

Processing Time

Model	First Run (incl. model load)	Subsequent Runs	Speed
Small	~78s	~25s	~10x realtime
Base	~100s	~29s	~9x realtime
Large	~130s	~41s	~6.5x realtime

💡 First run includes model download and loading. Subsequent runs use cached models.

Memory Optimization Details

AudioGhost uses a "Lite Mode" that removes unused model components:

Component Removed	VRAM Saved
Vision Encoder	~2GB
Visual Ranker	~2GB
Text Ranker	~2GB
Span Predictor	~1-2GB

Total Reduction: Up to 40% less VRAM compared to original SAM-Audio

This is achieved by:

Disabling video-related features (not needed for audio-only)
Using predict_spans=False and reranking_candidates=1
Using bfloat16 precision by default (optional float32 for quality)
25-second chunking for long audio files

Project Structure

audioghost-ai/
├── backend/
│   ├── main.py           # FastAPI app
│   ├── api/              # API routes
│   │   ├── auth.py       # HuggingFace auth
│   │   └── separate.py   # Separation endpoints
│   └── workers/
│       ├── celery_app.py # Celery config
│       └── tasks.py      # SAM Audio Lite worker
├── frontend/
│   ├── src/
│   │   ├── app/          # Next.js app
│   │   └── components/   # React components
│   └── package.json
├── sam_audio_lite.py     # Standalone lite version
├── QUICKSTART.md         # Quick setup guide
└── README.md

API Reference

POST /api/separate/

Create a separation task.

Form Data:

file - Audio file
description - Text prompt (e.g., "vocals")
mode - "extract" or "remove"
model_size - "small", "base", or "large" (default: "base")

Response:

{
  "task_id": "uuid",
  "status": "pending",
  "message": "Task submitted successfully"
}

GET /api/separate/{task_id}/status

Get task status and progress.

GET /api/separate/{task_id}/download/{stem}

Download result audio (ghost, clean, or original).

Troubleshooting

CUDA Out of Memory

Use model_size: "small" instead of "base" or "large"
Ensure lite mode is enabled (check for "Optimizing model for low VRAM" in logs)
Close other GPU applications

TorchCodec DLL Error

Downgrade to FFmpeg 7.x
Ensure FFmpeg bin directory is in PATH

HuggingFace 401 Error

Re-authenticate via the UI
Check that .hf_token exists in backend/

License

This project is licensed under the MIT License. SAM-Audio is licensed by Meta under a research license.

Credits

SAM-Audio by Meta AI Research
Core Optimization Logic: Special thanks to NilanEkanayake for providing the initial code modifications in Issue #24 that made VRAM inference reduction possible.
Built with ❤️ using Next.js, FastAPI, and Celery

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
banner.png		banner.png
docker-compose.yml		docker-compose.yml
install.bat		install.bat
logo.ico		logo.ico
sam_audio_lite.py		sam_audio_lite.py
start.bat		start.bat
stop.bat		stop.bat
test_video_only.py		test_video_only.py

Folders and files

Latest commit

History

Repository files navigation

AudioGhost AI 🎵👻

🎬 Demo

Audio Separation

Video Upload

Features

🗺️ Roadmap

Architecture

Requirements

🚀 One-Click Installation (Recommended)

First Time Setup

Daily Usage

Manual Setup (Advanced)

1. Start Redis

2. Create Anaconda Environment

3. Install PyTorch (CUDA 12.6)

4. Install FFmpeg (required by TorchCodec)

5. Install SAM Audio

6. Install Backend Dependencies

7. Install Frontend Dependencies

8. Start Services

9. Open the App

10. Connect HuggingFace

Usage

Performance Benchmarks

VRAM Usage (Lite Mode)

Processing Time

Memory Optimization Details

Project Structure

API Reference

POST /api/separate/

GET /api/separate/{task_id}/status

GET /api/separate/{task_id}/download/{stem}

Troubleshooting

CUDA Out of Memory

TorchCodec DLL Error

HuggingFace 401 Error

License

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages