Self-hosted AI transcription and intelligent note-taking platform
Documentation • Quick Start • Screenshots • Docker Hub • Releases
Speakr transforms your audio recordings into organized, searchable, and intelligent notes. Built for privacy-conscious groups and individuals, it runs entirely on your own infrastructure, ensuring your sensitive conversations remain completely private.
- Smart Recording & Upload - Record directly in browser or upload existing audio files
- AI Transcription - High-accuracy transcription with speaker identification
- Voice Profiles - AI-powered speaker recognition with voice embeddings (requires WhisperX ASR service)
- REST API v1 - Complete API with Swagger UI for automation tools (n8n, Zapier, Make) and dashboard widgets
- Single Sign-On - Authenticate with any OIDC provider (Keycloak, Azure AD, Google, Auth0, Pocket ID)
- Audio-Transcript Sync - Click transcript to jump to audio, auto-highlight current text, follow mode for hands-free playback
- Interactive Chat - Ask questions about your recordings and get AI-powered answers
- Inquire Mode - Semantic search across all recordings using natural language
- Internationalization - Full support for English, Spanish, French, German, Chinese, and Russian
- Beautiful Themes - Light and dark modes with customizable color schemes
- Internal Sharing - Share recordings with specific users with granular permissions (view/edit/reshare)
- Group Management - Create groups with automatic sharing via group-scoped tags
- Public Sharing - Generate secure links to share recordings externally (admin-controlled)
- Group Tags - Tags that automatically share recordings with all group members
- Smart Tagging - Organize with tags that include custom AI prompts and ASR settings
- Tag Prompt Stacking - Combine multiple tags to layer AI instructions for powerful transformations
- Tag Protection - Prevent specific recordings from being auto-deleted
- Group Retention Policies - Set custom retention periods per group tag
- Auto-Deletion - Automatic cleanup of old recordings with flexible retention policies
Different people use Speakr's collaboration and retention features in different ways:
| Use Case | Setup | What It Does |
|---|---|---|
| Family memories | Create "Family" group with protected tag | Everyone gets access to trips and events automatically, recordings preserved forever |
| Book club discussions | "Book Club" group, tag monthly meetings | All members auto-share discussions, can add personal notes about what resonated |
| Work project group | Share individually with 3 teammates | Temporary collaboration, easy to revoke when project ends |
| Daily group standups | Group tag with 14-day retention | Auto-share with group, auto-cleanup of routine meetings |
| Architecture decisions | Engineering group tag, protected from deletion | Technical discussions automatically shared, preserved permanently as reference |
| Client consultations | Individual share with view-only permission | Controlled external access, clients can't accidentally edit |
| Research interviews | Protected tag + Obsidian export | Preserve recordings indefinitely, transcripts auto-import to note-taking system |
| Legal consultations | Group tag with 7-year retention | Automatic sharing with legal group, compliance-based retention |
| Sales calls | Group tag with 1-year retention | Whole sales group learns from each call, cleanup after sales cycle |
Tags with custom prompts transform raw recordings into exactly what you need:
- Recipe recordings: Record yourself cooking while narrating - tag with "Recipe" to convert messy speech into formatted recipes with ingredient lists and numbered steps
- Lecture notes: Students tag lectures with "Study Notes" to get organized outlines with concepts, examples, and definitions instead of raw transcripts
- Code reviews: "Code Review" tag extracts issues, suggested changes, and action items in technical language developers can use directly
- Meeting summaries: "Action Items" tag ignores discussion and returns just decisions, tasks, and deadlines
Stack multiple tags to layer instructions:
- "Recipe" + "Gluten Free" = Formatted recipe with gluten substitution suggestions
- "Lecture" + "Biology 301" = Study notes format focused on biological terminology
- "Client Meeting" + "Legal Review" = Client requirements plus legal implications highlighted
The order can matter - start with format tags, then add focus tags for best results.
- Obsidian/Logseq: Enable auto-export to write completed transcripts directly to your vault using your custom template - no manual export needed
- Documentation wikis: Map auto-export to your wiki's import folder for seamless transcript publishing
- Content creation: Create SRT subtitle templates from your audio recordings for podcasts or video content
- Project management: Extract action items with custom tag prompts, then auto-export for automated task creation
# Create project directory
mkdir speakr && cd speakr
# Download docker-compose configuration:
wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/docker-compose.example.yml -O docker-compose.yml
# Download the environment template:
wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/env.transcription.example -O .env
# Configure your API keys and launch
nano .env
docker compose up -d
# Access at http://localhost:8899Lightweight image: Use
learnedmachine/speakr:litefor a smaller image (~725MB vs ~4.4GB) that skips PyTorch. All features work normally — only Inquire Mode's semantic search falls back to basic text search.
Required API Keys:
TRANSCRIPTION_API_KEY- For speech-to-text (OpenAI) orASR_BASE_URLfor self-hostedTEXT_MODEL_API_KEY- For summaries, titles, and chat (OpenRouter or OpenAI)
Speakr uses a connector-based architecture that auto-detects your transcription provider:
| Option | Setup | Speaker Diarization | Voice Profiles |
|---|---|---|---|
| OpenAI Transcribe | Just API key | ✅ gpt-4o-transcribe-diarize |
❌ |
| WhisperX ASR | GPU container | ✅ Best quality | ✅ |
| Legacy Whisper | Just API key | ❌ | ❌ |
Simplest setup (OpenAI with diarization):
TRANSCRIPTION_API_KEY=sk-your-openai-key
TRANSCRIPTION_MODEL=gpt-4o-transcribe-diarizeBest quality (Self-hosted WhisperX):
ASR_BASE_URL=http://whisperx-asr:9000
ASR_RETURN_SPEAKER_EMBEDDINGS=true # Enable voice profilesRequires WhisperX ASR Service container with GPU.
⚠️ PyTorch 2.6 Users: If you encounter a "Weights only load failed" error with WhisperX, addTORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=trueto your ASR container. See troubleshooting for details.
View Full Installation Guide →
Complete documentation is available at murtaza-nasir.github.io/speakr
- Getting Started - Quick setup guide
- User Guide - Learn all features
- Admin Guide - Administration and configuration
- Troubleshooting - Common issues and solutions
- FAQ - Frequently asked questions
Lightweight Docker Image
- Lite Image - New
learnedmachine/speakr:litetag (~725MB vs ~4.4GB) skips PyTorch/sentence-transformers for users who don't need semantic search - Multi-Stage Dockerfile - Optimized build with static ffmpeg binaries and smaller final image for both variants
- Improved Text Search - Better fallback search with stop word filtering, keyword-focused query enrichment, and match ranking
Thanks to sakowicz for the suggestion
Export Templates & Localization
- Customizable Export Templates - Create markdown templates for exports with variables (
{{title}},{{summary}},{{notes}}) and conditionals for optional sections - Localized Labels - Use
{{label.metadata}},{{label.summary}}etc. for automatically translated labels based on user's UI language - Localized Date Formatting - Export dates formatted per user's language preference (e.g., "15. Januar 2026" for German)
Improvements - Opt-in ASR chunking, speaker ID remapping across chunks, simplified About page transcription display
Bug Fixes - ASR empty text validation, cascade delete for recording relationships, missing model imports
Folders & Automation
- Folders Organization - Organize recordings into folders with custom prompts and ASR settings per folder
- Auto Speaker Labeling - Automatic speaker identification using voice embedding matching
- Per-User Auto-Summarization - User-configurable automatic summary generation
- Azure OpenAI Connector - New transcription connector for Azure OpenAI (experimental, community testing welcome)
- HTTPS Validation - Clear error messages when attempting to record on non-HTTPS connections
Improvements - Legacy ASR code removed (fully migrated to connector architecture), audio codec fallback to MP3, share page click-to-seek, new READABLE_PUBLIC_LINKS option for server-rendered transcripts (LLM/scraper accessible)
Bug Fixes - PostgreSQL boolean defaults in migrations, folders feature detection, audio player visibility for incognito recordings
Incognito Mode Enhancements & Compatibility Fixes
- Incognito Mode for In-App Recordings - The incognito toggle now works for microphone recordings, not just uploads
- Default Incognito Mode - New
INCOGNITO_MODE_DEFAULT=trueoption to start with incognito enabled by default - LLM Streaming Compatibility - New
ENABLE_STREAM_OPTIONS=falseoption for LLM servers that don't support OpenAI's stream_options parameter
Bulk Operations & Privacy Features
- Multi-Select Mode - Select multiple recordings in sidebar for batch operations (delete, tag, reprocess, toggle inbox/highlight)
- Incognito Mode - Session-only transcription processing with no database storage (enable with
ENABLE_INCOGNITO_MODE=true) - Playback Speed Control - Adjustable 0.5x to 3x speed on all audio players with persistent preference
Bug Fixes - Fixed language selection not being passed to ASR service, improved reprocess modal
Naming Templates
- Custom Title Formatting - Create templates with variables like
{{ai_title}},{{filename}},{{date}}and custom regex patterns - Tag-Based or User Default - Assign templates to tags or set a user-wide default
- Token Savings - Templates without
{{ai_title}}skip the AI call entirely - API v1 Upload - New
/api/v1/uploadendpoint for programmatic recording uploads
Improvements - Tag drag-and-drop reordering, registration domain restriction, event delete button, WebM seeking fix
Transcription Usage Tracking
- Per-User Budgets - Set monthly transcription limits (in minutes) with 80% warning and 100% blocking
- Usage Dashboard - Track minutes, costs, and per-user breakdowns in Admin panel
- Cost Estimation - Automatic pricing for OpenAI Whisper/Transcribe and self-hosted ASR
Bug Fixes
- Diarization for Long Files - Fixed speaker diarization for chunked files with OpenAI's
gpt-4o-transcribe-diarize - Empty Segment Filtering - Removed empty transcript segments from diarized output
Cloud Diarization & REST API
- Speaker Diarization Without GPU - Use OpenAI's
gpt-4o-transcribe-diarizefor speaker identification with just an API key - REST API v1 - Full-featured API for automation tools (n8n, Zapier, Make) and dashboard widgets
- Connector Architecture - Modular transcription providers with simplified configuration
- Virtual Scrolling - Performance optimization for handling 4500+ transcript segments smoothly
- Audio Player Improvements - Drag-to-seek, independent modal players, improved theme support
- File Date Handling - Uses original recording date from file metadata instead of upload time
- Codec Configuration - Configure unsupported audio codecs with automatic conversion
- PostgreSQL Support - Added
psycopg2-binarydriver for PostgreSQL database option - Audio Download Button - Explicit download button next to audio player, works on mobile
- Job Queue Race Condition Fix - Fixed issue where multiple workers could claim the same job
Thanks to sakowicz, JadedBlueEyes, and Daabramov
- SSO Authentication - Sign in with any OIDC provider (Keycloak, Azure AD, Google, Auth0, Pocket ID)
- Account Linking/Unlinking - Link or unlink SSO from Account settings
- Enforce SSO-only - Disable password login for regular users
Contributed by Dmitry Abramov | SSO Setup Guide
⚠️ IMPORTANT: v0.5.9 introduced significant architectural changes. If upgrading from earlier versions, backup your data first and review the configuration guide.
- Complete Internal Sharing System - Share recordings with users with granular permissions (view/edit/reshare)
- Group Management & Collaboration - Create groups with auto-sharing via group tags and custom retention policies
- Speaker Voice Profiles - AI-powered speaker identification with 256-dimensional voice embeddings
- Audio-Transcript Synchronization - Click-to-jump, auto-highlight, and follow mode for interactive navigation
- Auto-Deletion & Retention System - Flexible retention policies with global and group-level controls
- Automated Export - Auto-export transcriptions to markdown for Obsidian, Logseq, and other note-taking apps
- Permission System - Fine-grained access control throughout the application
- Modular Architecture - Backend refactored into blueprints, frontend composables for maintainability
- UI/UX Enhancements - Compact controls, inline editing, unified toast notifications, improved badges
- Enhanced Internationalization - 29 new tooltip translations across all supported languages
Main Screen with Chat |
AI-Powered Semantic Search |
Interactive Transcription & Chat |
Full Internationalization |
View Full Screenshot Gallery →
- Backend: Python/Flask with SQLAlchemy
- Frontend: Vue.js 3 with Tailwind CSS
- AI/ML: OpenAI Whisper, OpenRouter, Ollama support
- Database: SQLite (default) or PostgreSQL
- Deployment: Docker, Docker Compose
- ✅ Speaker voice profiles with AI-powered identification (v0.5.9)
- ✅ Group workspaces with shared recordings (v0.5.9)
- ✅ PWA enhancements with offline support and background sync (v0.5.10)
- ✅ Multi-user job queue with fair scheduling (v0.6.0)
- ✅ SSO integration with OIDC providers (v0.7.0)
- ✅ Token usage tracking and per-user budgets (v0.7.2)
- ✅ Connector-based transcription architecture with auto-detection (v0.8.0)
- ✅ Comprehensive REST API with Swagger UI documentation (v0.8.0)
- Quick language switching for transcription
- Automated workflow triggers
- Plugin system for custom integrations
- End-to-end encryption option
This project is dual-licensed:
-
GNU Affero General Public License v3.0 (AGPLv3)
Speakr is offered under the AGPLv3 as its open-source license. You are free to use, modify, and distribute this software under the terms of the AGPLv3. A key condition of the AGPLv3 is that if you run a modified version on a network server and provide access to it for others, you must also make the source code of your modified version available to those users under the AGPLv3.
- You must create a file named
LICENSE(orCOPYING) in the root of your repository and paste the full text of the GNU AGPLv3 license into it. - Read the full license text carefully to understand your rights and obligations.
- You must create a file named
-
Commercial License
For users or organizations who cannot or do not wish to comply with the terms of the AGPLv3 (for example, if you want to integrate Speakr into a proprietary commercial product or service without being obligated to share your modifications under AGPLv3), a separate commercial license is available.
Please contact speakr maintainers for details on obtaining a commercial license.
You must choose one of these licenses under which to use, modify, or distribute this software. If you are using or distributing the software without a commercial license agreement, you must adhere to the terms of the AGPLv3.
We welcome contributions to Speakr! There are many ways to help:
- Bug Reports & Feature Requests: Open an issue
- Discussions: Share ideas and ask questions
- Documentation: Help improve our docs
- Translations: Contribute translations for internationalization
All code contributions require signing a Contributor License Agreement (CLA). This one-time process ensures we can maintain our dual-license model (AGPLv3 and Commercial).
See our Contributing Guide for complete details on:
- How the CLA works and why we need it
- Step-by-step contribution process
- Development setup instructions
- Coding standards and best practices
The CLA is automatically enforced via GitHub Actions. When you submit your first PR, our bot will guide you through signing.



