A browser-based implementation of LiteAvatar using WASM. No backend server required.
This project adapts HumanAIGC/lite-avatar for browser deployment. The original project requires Python backend and GPU acceleration, while this version runs entirely in the browser.
- Browser-based processing using ONNX Runtime Web (WASM)
- Complete Paraformer feature extraction pipeline in browser (fbank + LFR + CMVN)
- Large models (>100MB) hosted on Hugging Face to avoid Git LFS limitations
- Static hosting support: GitHub Pages, Vercel, Netlify, etc.
- No backend dependencies
Models are large (>100MB) and excluded from Git. Download them manually from the links below.
weights/paraformer_hidden.onnx(603MB FP32) - Paraformer encoder for feature extractionweights/model_1.onnx- Audio2mouth modeldata/preload/net_encode.onnx- Encoder modeldata/preload/net_decode.onnx- Decoder model
Models can be downloaded from:
- Hugging Face (Recommended): fushengji/lite-avatar-wasm
paraformer_hidden.onnx(633MB)model_1.onnx(184MB)
You can also download models from ModelScope or use the export scripts (see Development section below).
# Start a simple HTTP server
python -m http.server 8000
# Open http://localhost:8000 in your browser-
Push code to GitHub:
git add . git commit -m "Deploy to GitHub Pages" git push origin main
-
Enable GitHub Pages:
- Go to: https://github.com/fusheng-ji/lite-avatar-WASM/settings/pages
- In the "Source" section, select: GitHub Actions
- Save settings
-
Access your site:
- After deployment, visit: https://fusheng-ji.github.io/lite-avatar-WASM
- Vercel: Connect your repository, it will auto-detect static files
- Netlify: Drag and drop your folder or connect via Git
- Cloudflare Pages: Connect repository and set build command to
echo "No build needed"
- Place model files in the correct paths (or update paths in
js/config.js) - Open
index.htmlin your browser - Upload audio file (or use default sample audio / microphone recording)
- Upload avatar data (or use default sample data)
- Click "Generate Video" and wait for rendering
- Sample data:
data/preload/contains sample avatar data (bg_video.mp4,neutral_pose.npy,face_box.txt,ref_frames/, etc.) - More avatars: LiteAvatarGallery
lite-avatar-WASM/
├── index.html # Main HTML file
├── assets/
│ └── banner.svg # Project banner
├── js/
│ ├── lite-avatar-web.js # Main frontend logic
│ ├── paraformer-frontend.js # Paraformer feature extraction (fbank + LFR + CMVN)
│ ├── config.js # Configuration
│ └── i18n.js # Internationalization
├── weights/
│ ├── paraformer_hidden.onnx # Paraformer encoder (603MB, download from Hugging Face)
│ └── model_1.onnx # Audio2mouth model (download from Hugging Face)
├── data/
│ └── preload/ # Sample avatar data
│ ├── net_encode.onnx # Encoder model
│ ├── net_decode.onnx # Decoder model
│ ├── bg_video.mp4 # Background video
│ ├── neutral_pose.npy # Neutral pose data
│ ├── face_box.txt # Face bounding box
│ └── ref_frames/ # Reference frames (150 jpg files)
├── utils/
│ ├── export_paraformer_hidden_onnx.py # Export Paraformer to ONNX
│ └── test_paraformer_onnx.py # Test exported ONNX model
├── extract_paraformer_feature.py # Paraformer feature extraction utility
├── funasr_local/ # FunASR local dependencies (for model export)
├── requirements.txt # Python dependencies
└── README.md # This file
The utils/ directory contains scripts for exporting the Paraformer model from PyTorch to ONNX format.
# Install dependencies
pip install torch onnxruntime funasr-
Prepare Model Files
Ensure you have the Paraformer model files in the correct location:
weights/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/ ├── config.yaml ├── model.pb └── am.mvn -
Export to ONNX
Run the export script:
cd utils python export_paraformer_hidden_onnx.pyThe script loads the Paraformer model from PyTorch format, wraps it with
ParaformerHiddenWrapperSingleInput(fromextract_paraformer_feature.py), exports the encoder hidden states to ONNX format, and saves it asweights/paraformer_hidden.onnx. -
Export Details
- Input: Frontend features (after LFR) with shape
[batch, time, 560]- Feature dimension: 80 mels × 7 (LFR) = 560
- Fixed time dimension: 150 frames (inputs longer than this will be truncated)
- Output: Encoder hidden states
- Format: FP32 ONNX model (opset version 17)
- Size: ~603MB
- Input: Frontend features (after LFR) with shape
-
Test the Exported Model
Verify the exported ONNX model:
cd utils python test_paraformer_onnx.pyThis script loads the ONNX model, displays input/output shapes, runs inference with dummy data, and verifies the model works correctly.
- The exported model uses a fixed time dimension of 150 frames
- Input sequences longer than 150 frames will cause runtime errors
- The frontend should truncate or pad inputs to exactly 150 frames
- The wrapper bypasses the frontend FFT to avoid
aten::fft_rfftoperations during export - If the model is larger than 2GB, weights will be stored in a separate
.onnx_datafile
- WASM: All modern browsers (Chrome, Firefox, Safari, Edge)
- Audio APIs: Modern browsers with Web Audio API support
- Feature Extraction: ~1-2 seconds for 5 seconds of audio (depends on device)
- Video Generation: ~5-10 seconds for 150 frames (depends on device)
- Memory Usage: ~1-2GB RAM (mainly for model loading)
Based on HumanAIGC/lite-avatar, adapted for the browser.
Thanks to these projects:
- LiteAvatar - Original real-time 2D chat avatar project
- Paraformer & FunASR - Audio feature extraction
- HeadTTS - Reference implementation
- ONNX Runtime Web - Browser inference engine
MIT License