Skip to content

yagna-1/voicebridge-hackathon

Repository files navigation

VoiceBridge: Personalized Speech Assistant

GitHub Gemma 4 Good Hackathon Unsloth Android

VoiceBridge is an offline Android assistant for users with speech impairment (dysarthria and related conditions). It fine-tunes Gemma 4 on a single user's voice and runs on-device for private, low-latency voice control.

This repository is organized so a 3-person team can execute the hackathon plan in parallel:

  • ML Lead: dataset prep, Unsloth fine-tuning, WER/CER benchmarking, Hugging Face publish
  • Android Lead: data collection app, inference app, action dispatcher, latency benchmarks
  • Product/Video Lead: user studies, impact proof, video, writeup, submission packaging

Repository Structure

.
├── android/                       # Android Studio project
├── benchmarks/                    # WER/CER + latency report tooling
├── data/                          # 195-sentence prompt list and notes
├── docs/                          # execution and submission checklists
├── training/                      # Kaggle/Colab training pipeline
├── kaggle_writeup_template.md
├── model_card.md
└── speech_assistant_hackathon_guide.md

Quick Start

0) If you do not have Android Studio or local GPU

Use the fallback path:

  • docs/no_android_low_compute_path.md

This gives you:

  • browser-based data collection (web_demo/),
  • typed command demo for video and testing,
  • cloud-only training (Kaggle/Colab) with lighter model options.

1) ML setup (Kaggle preferred)

Open training/train.ipynb in Kaggle and run all cells in order.

If running locally:

python3 -m venv .venv
source .venv/bin/activate
pip install -r training/requirements.txt
python3 training/train.py --help

2) Android setup

  1. Open android/ in Android Studio (Meerkat or newer).
  2. Sync Gradle.
  3. Connect a real Android device.
  4. Run app module.
  5. Set MODEL_URL in android/app/src/main/java/com/voicebridge/speechassist/model/ModelDownloadConfig.kt.
  6. Use Download Model in the app (or preload model file manually).
  7. Use DataCollectionActivity to capture sentence recordings.
  8. Use typed demo mode for reliable command walkthroughs during integration.

2b) Browser demo setup (no Android tooling required)

python3 -m venv .venv
source .venv/bin/activate
pip install -r web_demo/requirements.txt
python3 web_demo/app.py

Then open http://localhost:7860.

3) Benchmarks

python3 benchmarks/benchmark_report.py \
  --baseline-wer 0.72 \
  --baseline-cer 0.55 \
  --finetuned-wer 0.14 \
  --finetuned-cer 0.09 \
  --model-name "gemma-4-e4b personalized" \
  --user-description "Adult with moderate dysarthria" \
  --num-train-samples 720 \
  --num-val-samples 80 \
  --inference-latency-ms 820 \
  --device-name "OnePlus 12"

Week 1-4 Operating Plan

Use:

  • docs/hackathon_execution_checklist.md
  • docs/submission_checklist.md
  • docs/no_android_low_compute_path.md

These documents are optimized for a 4-week sprint and mirror the guide.

Critical Notes

  • Keep all personally identifiable or sensitive user audio private.
  • Get explicit consent from the user featured in demo/video/writeup.
  • For model inference on device, place or download the quantized model into the app's external files directory.
  • The inference engine uses the modern LiteRT-LM Engine/Conversation runtime with legacy fallback.
  • Always validate on target device because audio backend/model support varies by model export and runtime version.

Deliverables

By submission day, ensure these are ready:

  • Public GitHub repository
  • Public Hugging Face adapter + GGUF
  • benchmarks/benchmark_report.json
  • Unlisted YouTube demo under 3 minutes
  • Final Kaggle writeup (based on template)

About

Offline, personalized speech assistant for dysarthria users. Fine-tunes Gemma 4 on a single user's voice and runs on-device via LiteRT-LM.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors