Real-time AI voice detection optimized for phone call quality audio. Unlike most detectors that fail on telephony audio, this one is built specifically for the compression, noise, and limited bandwidth of actual phone calls.
Most voice cloning scams happen over phone calls, but most AI detectors are trained on clean studio audio. Phone calls have:
- 8kHz sample rate (vs 44kHz normal audio)
- Narrow frequency band (300Hz - 3.4kHz)
- Heavy compression (GSM, AMR codecs)
- Noise, echo, packet loss
This detector is trained on phone-degraded audio to catch deepfakes where they actually happen.
- Phone-optimized model - Trained on 8kHz, codec-compressed audio
- Real-time detection - Works during live phone calls
- On-device processing - No data leaves your phone
- Low latency - Results in <200ms
- 4 model architectures - From lightweight mobile to full attention-based
Phone Call (8kHz)
|
v
+------------------+
| Noise Reduction |
+--------+---------+
|
v
+------------------+
| Mel Spectrogram | 64 bands, phone frequency range
+--------+---------+
|
v
+------------------+
| Lightweight CNN | Optimized for mobile
+--------+---------+
|
v
REAL / FAKE + confidence %
Training data is created by degrading clean audio to simulate phone calls:
clean_audio
-> Resample to 8kHz
-> Apply GSM codec simulation
-> Bandpass filter (300Hz - 3.4kHz)
-> Add background noise (SNR 10-30dB)
-> Simulate packet loss (0-5%)
-> Add echo/reverbcd deepfake-phone-detector
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt# Setup directories and get instructions
python scripts/download_dataset.py
# If using ASVspoof dataset:
python scripts/prepare_asvspoof.py --zip data/downloads/LA.zip
# Apply phone degradation
python scripts/degrade_to_phone.py --input data/raw --output data/phonepython model/train.py --data data/phone --epochs 50 --model mobilepython scripts/test_detection.py audio.wav --checkpoint checkpoints/best.pt --model mobilepython model/export_tflite.py --checkpoint checkpoints/best.pt --output android/app/src/main/assets/model.tflitecd android
./gradlew assembleDebug
# APK: android/app/build/outputs/apk/debug/app-debug.apkdeepfake-phone-detector/
├── model/
│ ├── phone_augment.py # Phone degradation pipeline (8kHz, GSM, noise)
│ ├── dataset.py # PyTorch dataset for phone-quality audio
│ ├── model.py # 4 CNN architectures (PhoneCNN, Mobile, Temporal, Attention)
│ ├── train.py # Training loop with validation
│ └── export_tflite.py # ONNX -> TFLite conversion
├── android/
│ ├── app/src/main/
│ │ ├── java/.../ # Kotlin: MainActivity, AudioService, Classifier
│ │ └── res/ # UI layouts and resources
│ └── build.gradle
├── scripts/
│ ├── download_dataset.py # Dataset setup instructions
│ ├── prepare_asvspoof.py # ASVspoof dataset extraction
│ ├── degrade_to_phone.py # Batch phone degradation
│ ├── generate_bark_samples.py # Generate fake samples with Bark TTS
│ └── test_detection.py # Inference script
├── data/ # Dataset (gitignored)
│ ├── raw/real/ # Original real voice samples
│ ├── raw/fake/ # Original AI-generated samples
│ ├── phone/real/ # Phone-degraded real samples
│ └── phone/fake/ # Phone-degraded fake samples
└── requirements.txt
| Model | Parameters | Inference | Use Case |
|---|---|---|---|
phone_cnn |
30K | 15ms | Ultra-lightweight baseline |
temporal |
85K | 25ms | Better temporal artifacts |
mobile |
17K | 10ms | Android deployment |
attention |
443K | 40ms | Highest accuracy |
- ASVspoof 2019/2021 - bonafide samples
- LibriSpeech - clean speech
- VoxCeleb
- ASVspoof 2019/2021 - 17 spoofing systems
- Generate with Bark TTS (included script)
- Generate with Coqui TTS
- ElevenLabs API
- ML: PyTorch 2.x, torchaudio
- Audio Processing: librosa, scipy, audiomentations
- Mobile: TensorFlow Lite, Kotlin
- Phone Simulation: Custom GSM codec simulation, bandpass filters
| Dataset | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|
| ASVspoof (clean) | ~95% | ~94% | ~96% | ~95% |
| ASVspoof (phone) | ~92% | ~90% | ~93% | ~91% |
| In-the-wild | ~88% | ~85% | ~90% | ~87% |
MIT