Local transcription, diarization, and summarization for therapy sessions.
- Fully Local Processing – No cloud services; all transcription and summarization run on your machine.
- Accurate Transcription – Powered by faster-whisper-v3-turbo for efficient, high-quality audio transcription.
- Speaker Diarization – Differentiates therapist and client voices by comparing against coach/therapist audio sample.
- Session Summaries – Summarizes conversations using a local LLM.
- Intuitive Interface – PyWebview GUI built on HTML/CSS/JS for simple interaction.
- Voice recognition – Can analyze and store a person's voice sample to make automatic labelling of therapist and client voices more accurate.
- Python 3.13
- pip
- ffmpeg (for audio processing)
- see requirements.txt for full library overview
-
Clone the repository git clone https://github.com/yourusername/therapy-transcriber.git cd Blaat
-
Install dependencies pip install -r requirements.txt
-
Download models Install models (not included in repository) #TODO: Add download links for models
-
**Run the app ** python launcher.py The app will launch a local PyWebview window for interacting with the transcription and summaries.
Launch the app: python main.py Upload a brief (15-30 second) sample of the coach/therapist speaking. This is used for transcript labeling (e.g. who said what). For best results: Use 16 KHz sample rate, mono settings.
- Upload an audio file of the therapy session.
- Enter client or session name.
- Enter the date. Repeat the 3 steps above if multiple session audio files need to be processed. Click "Verwerk". Wait for transcription and speaker diarization to complete. Once complete, click "Generate Summary" to initiate LLM summary of transcript. Review session summaries and notes. For each audio-file, 3 text files will be created: 1 transcript, 1 diarized transcript, 1 summary.
Tip: Longer sessions may take more time; progress updates are shown in the interface.
Audio Upload → Whisper-Faster → Transcription to raw text → Speaker Diarization → Chunking labelled transcript → Local LLM summarizes each chunk → Local LLM combines summary chunks into total summary → Stored in text file
Backend: FastAPI serves as the local API for audio processing and LLM summarization. Transcription: Whisper-Faster processes audio files efficiently. Speaker Diarization: Differentiates voices to label speaker segments. Summarization: Local LLM generates concise session summaries. Frontend: PyWebview renders an HTML/CSS/JS GUI with session management.
Model Selection: You can change the LLM, Whisper model and language settings in config.py Model Summary: app.api.services.summarizer.py holds summarizing functions, as well as exact system prompts to LLM. For other use cases, make changes to these prompts.
This project is licensed under the MIT License.
This application uses local version of third-party AI models subject to their own licenses and terms. Users are responsible for complying with the licenses of any downloaded or configured models.
faster-whisper-large-v3-turbo-ct2 – Fast, local speech recognition: https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2
FastAPI – Modern Python API framework.
PyWebview – Lightweight GUI for Python.
Open-source community for various other libraries and inspiration.