VoiceBridge is an offline Android assistant for users with speech impairment (dysarthria and related conditions). It fine-tunes Gemma 4 on a single user's voice and runs on-device for private, low-latency voice control.
This repository is organized so a 3-person team can execute the hackathon plan in parallel:
- ML Lead: dataset prep, Unsloth fine-tuning, WER/CER benchmarking, Hugging Face publish
- Android Lead: data collection app, inference app, action dispatcher, latency benchmarks
- Product/Video Lead: user studies, impact proof, video, writeup, submission packaging
.
├── android/ # Android Studio project
├── benchmarks/ # WER/CER + latency report tooling
├── data/ # 195-sentence prompt list and notes
├── docs/ # execution and submission checklists
├── training/ # Kaggle/Colab training pipeline
├── kaggle_writeup_template.md
├── model_card.md
└── speech_assistant_hackathon_guide.md
Use the fallback path:
docs/no_android_low_compute_path.md
This gives you:
- browser-based data collection (
web_demo/), - typed command demo for video and testing,
- cloud-only training (Kaggle/Colab) with lighter model options.
Open training/train.ipynb in Kaggle and run all cells in order.
If running locally:
python3 -m venv .venv
source .venv/bin/activate
pip install -r training/requirements.txt
python3 training/train.py --help- Open
android/in Android Studio (Meerkat or newer). - Sync Gradle.
- Connect a real Android device.
- Run app module.
- Set
MODEL_URLinandroid/app/src/main/java/com/voicebridge/speechassist/model/ModelDownloadConfig.kt. - Use Download Model in the app (or preload model file manually).
- Use
DataCollectionActivityto capture sentence recordings. - Use typed demo mode for reliable command walkthroughs during integration.
python3 -m venv .venv
source .venv/bin/activate
pip install -r web_demo/requirements.txt
python3 web_demo/app.pyThen open http://localhost:7860.
python3 benchmarks/benchmark_report.py \
--baseline-wer 0.72 \
--baseline-cer 0.55 \
--finetuned-wer 0.14 \
--finetuned-cer 0.09 \
--model-name "gemma-4-e4b personalized" \
--user-description "Adult with moderate dysarthria" \
--num-train-samples 720 \
--num-val-samples 80 \
--inference-latency-ms 820 \
--device-name "OnePlus 12"Use:
docs/hackathon_execution_checklist.mddocs/submission_checklist.mddocs/no_android_low_compute_path.md
These documents are optimized for a 4-week sprint and mirror the guide.
- Keep all personally identifiable or sensitive user audio private.
- Get explicit consent from the user featured in demo/video/writeup.
- For model inference on device, place or download the quantized model into the app's external files directory.
- The inference engine uses the modern LiteRT-LM
Engine/Conversationruntime with legacy fallback. - Always validate on target device because audio backend/model support varies by model export and runtime version.
By submission day, ensure these are ready:
- Public GitHub repository
- Public Hugging Face adapter + GGUF
benchmarks/benchmark_report.json- Unlisted YouTube demo under 3 minutes
- Final Kaggle writeup (based on template)