Blaat 2.0: Process therapy/coaching audio files

Local transcription, diarization, and summarization for therapy sessions.

Features

Fully Local Processing – No cloud services; all transcription and summarization run on your machine.
Accurate Transcription – Powered by faster-whisper-v3-turbo for efficient, high-quality audio transcription.
Speaker Diarization – Differentiates therapist and client voices by comparing against coach/therapist audio sample.
Session Summaries – Summarizes conversations using a local LLM.
Intuitive Interface – PyWebview GUI built on HTML/CSS/JS for simple interaction.
Voice recognition – Can analyze and store a person's voice sample to make automatic labelling of therapist and client voices more accurate.

Installation

Prerequisites

Python 3.13
pip
ffmpeg (for audio processing)
see requirements.txt for full library overview

Steps

Clone the repository git clone https://github.com/yourusername/therapy-transcriber.git cd Blaat
Install dependencies pip install -r requirements.txt
Download models Install models (not included in repository) #TODO: Add download links for models
**Run the app ** python launcher.py The app will launch a local PyWebview window for interacting with the transcription and summaries.

Usage

Launch the app: python main.py Upload a brief (15-30 second) sample of the coach/therapist speaking. This is used for transcript labeling (e.g. who said what). For best results: Use 16 KHz sample rate, mono settings.

Upload an audio file of the therapy session.
Enter client or session name.
Enter the date. Repeat the 3 steps above if multiple session audio files need to be processed. Click "Verwerk". Wait for transcription and speaker diarization to complete. Once complete, click "Generate Summary" to initiate LLM summary of transcript. Review session summaries and notes. For each audio-file, 3 text files will be created: 1 transcript, 1 diarized transcript, 1 summary.

Tip: Longer sessions may take more time; progress updates are shown in the interface.

Architecture

Audio Upload → Whisper-Faster → Transcription to raw text → Speaker Diarization → Chunking labelled transcript → Local LLM summarizes each chunk → Local LLM combines summary chunks into total summary → Stored in text file

Backend: FastAPI serves as the local API for audio processing and LLM summarization. Transcription: Whisper-Faster processes audio files efficiently. Speaker Diarization: Differentiates voices to label speaker segments. Summarization: Local LLM generates concise session summaries. Frontend: PyWebview renders an HTML/CSS/JS GUI with session management.

Configuration

Model Selection: You can change the LLM, Whisper model and language settings in config.py Model Summary: app.api.services.summarizer.py holds summarizing functions, as well as exact system prompts to LLM. For other use cases, make changes to these prompts.

License

This project is licensed under the MIT License.

Third-Party Models

This application uses local version of third-party AI models subject to their own licenses and terms. Users are responsible for complying with the licenses of any downloaded or configured models.

Acknowledgements

faster-whisper-large-v3-turbo-ct2 – Fast, local speech recognition: https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2

FastAPI – Modern Python API framework.

PyWebview – Lightweight GUI for Python.

Open-source community for various other libraries and inspiration.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
storage		storage
tests		tests
.gitignore		.gitignore
Blaat.code-workspace		Blaat.code-workspace
LICENSE		LICENSE
launcher.py		launcher.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Blaat 2.0: Process therapy/coaching audio files

Table of Contents

Features

Installation

Prerequisites

Steps

Usage

Architecture

Configuration

License

Third-Party Models

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Blaat 2.0: Process therapy/coaching audio files

Table of Contents

Features

Installation

Prerequisites

Steps

Usage

Architecture

Configuration

License

Third-Party Models

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages