A real-time audio transcription, diarization, and summarization desktop app for Linux. Built with Tauri v2, SvelteKit, and Python.
Mnemosyne is the personification of memory in Greek mythology. She was a Titaness, the daughter of Uranus and Gaia, and the mother of the nine Muses by Zeus. The name derives from the Greek word "mneme," meaning "memory" or "remembrance."
- Real-time transcription using WhisperX (faster-whisper + word-level alignment)
- Speaker diarization using pyannote.audio 3.1
- Pluggable summarization via Ollama (LAN default), vLLM, OpenAI, or Anthropic
- Multiple audio sources — capture system audio and microphone simultaneously via PipeWire
- Session management — create, rename, delete sessions with full persistence to disk
- Obsidian integration — export sessions as markdown with YAML frontmatter directly to your vault
- Desktop app — native Linux window via Tauri v2 (not a browser tab)
- Storage-friendly — audio saved as OGG/Opus (~12x smaller than WAV)
Tauri v2 (Rust) — Window management, native dialogs
SvelteKit (Svelte 5) — Frontend UI, reactive state via runes
Python FastAPI (sidecar) — Audio capture, ML inference, API server on :8008
The Python backend runs as a sidecar process managed by Tauri. The frontend communicates with it over HTTP (REST) and WebSocket (real-time transcription streaming).
Summarization is LAN-first by default:
- Ollama (configurable URL, default
localhost:11434) - vLLM (configurable URL, OpenAI-compatible API)
- OpenAI and Anthropic available when API keys are set
- Linux with PipeWire audio (Fedora 43+, Ubuntu 22.04+, etc.)
- NVIDIA GPU with CUDA support (for WhisperX inference)
- Rust (1.77+)
- Node.js 22.x (pinned via
.nvmrc) - pnpm (10+)
- uv (Python project manager)
- ffmpeg (for audio format conversion)
- Tauri system dependencies:
# Fedora sudo dnf install webkit2gtk4.1-devel openssl-devel curl wget file \ libappindicator-gtk3-devel librsvg2-devel pango-devel - HuggingFace account with accepted model licenses:
-
Clone and install frontend dependencies:
git clone <repo-url> && cd mnemosyne nvm use # uses Node 22 from .nvmrc pnpm install
-
Set up the Python backend:
cd backend uv sync # creates venv with Python 3.13, installs deps
For GPU transcription, also install WhisperX and PyTorch (these are heavy and may already be in your venv):
uv pip install torch torchaudio whisperx
-
Configure environment:
cp backend/.env.example backend/.env # Edit backend/.env and set: # HF_TOKEN=your_huggingface_token # OLLAMA_URL=http://your-ollama-server:11434 # OBSIDIAN_VAULT_PATH=/path/to/your/vault (optional)
-
Run in development mode (single terminal):
pnpm tauri dev
Tauri automatically spawns the Python backend (via
uv run uvicorn) and manages its lifecycle. The backend starts on127.0.0.1:8008with hot-reload enabled. When you close the window, the backend is automatically shut down.
- Create a session using the sidebar
- Select audio devices — check the microphone and/or system audio monitors you want to capture
- Record — click Record or press
Ctrl+R; pressCtrl+Sto stop and transcribe - View transcript — diarized segments appear in the Transcript tab with speaker labels and timestamps
- Summarize — switch to the Summary tab, pick a provider/model, and click Summarize
- Take notes — use the Notes tab for freeform markdown notes per session
- Export to Obsidian — configure your vault path in the Export tab and click Export
| Shortcut | Action |
|---|---|
Ctrl+R |
Start recording |
Ctrl+S |
Stop recording & transcribe |
Ctrl+E |
Export to Obsidian |
Ctrl+B |
Toggle sidebar |
mnemosyne/
├── src/ # SvelteKit frontend
│ ├── lib/
│ │ ├── components/ # Svelte 5 components
│ │ ├── stores/ # Rune-based reactive stores
│ │ ├── api/ # Backend HTTP/WS client
│ │ └── types/ # TypeScript types
│ └── routes/ # SPA page
├── src-tauri/ # Tauri v2 Rust shell
│ ├── tauri.conf.json
│ └── src/lib.rs
├── backend/ # Python FastAPI backend
│ ├── pyproject.toml
│ ├── main.py
│ └── src/mnemosyne/
│ ├── api/ # FastAPI routes + WebSocket
│ ├── audio/ # PipeWire capture + Opus encoding
│ ├── transcription/ # WhisperX engine
│ ├── summarization/ # Ollama, vLLM, OpenAI, Anthropic providers
│ ├── export/ # Obsidian markdown exporter
│ ├── models/ # Pydantic data models
│ ├── services/ # Session + model lifecycle
│ └── config.py # Environment configuration
├── scripts/ # Build and dev scripts
│ ├── build-backend.sh # PyInstaller backend build
│ ├── build-all.sh # Frontend + backend build orchestrator
│ └── package.sh # Full packaging (AppImage/deb)
└── data/ # Runtime data (gitignored)
├── sessions/ # JSON session files
└── recordings/ # Audio files (OGG/Opus)
All configuration is via environment variables in backend/.env:
| Variable | Default | Description |
|---|---|---|
HF_TOKEN |
(required) | HuggingFace token for pyannote diarization models |
WHISPER_MODEL_SIZE |
medium.en |
WhisperX model size (base, small, medium, large-v2, large-v3) |
WHISPER_COMPUTE_TYPE |
float16 |
Compute type (float16, int8) |
WHISPER_BATCH_SIZE |
8 |
Batch size for transcription (lower = less VRAM) |
OLLAMA_URL |
http://localhost:11434 |
Ollama API endpoint |
VLLM_URL |
http://localhost:8000 |
vLLM API endpoint |
OPENAI_API_KEY |
(optional) | Enables OpenAI provider |
ANTHROPIC_API_KEY |
(optional) | Enables Anthropic provider |
OBSIDIAN_VAULT_PATH |
(optional) | Path to Obsidian vault for export |
OBSIDIAN_SUBFOLDER |
meetings/mnemosyne |
Subfolder within vault |
pnpm tauri dev # Starts Tauri + Vite + Python backend (all in one)The Python backend is bundled as a self-contained binary using PyInstaller:
bash scripts/build-backend.shThis produces src-tauri/binaries/mnemosyne-backend-dir/ (~7 GB, includes PyTorch CUDA libraries).
bash scripts/package.sh # Builds backend + frontend + Tauri → AppImage/debArtifacts are placed in src-tauri/target/release/bundle/.
The packaged app bundles the Python backend and all ML libraries, but requires:
- PipeWire (
pw-record,pw-dump) — system audio stack - ffmpeg — audio format conversion
- NVIDIA drivers + CUDA runtime — GPU transcription
- ML models (~3-5 GB) are downloaded from HuggingFace on first transcription
Exported sessions produce markdown files with YAML frontmatter:
---
title: "Session Name"
date: 2026-02-18
type: meeting-note
source: mnemosyne
participants: ["SPEAKER_00", "SPEAKER_01"]
tags: [meeting, mnemosyne]
---Followed by sections for Summary, Participants, Notes, and the full Transcript with timestamps.
- Desktop shell: Tauri v2 (Rust)
- Frontend: SvelteKit (Svelte 5 with runes), Tailwind CSS 4
- Backend: FastAPI (Python 3.13 via uv)
- Transcription: WhisperX (faster-whisper + pyannote 3.1)
- Summarization: Ollama, vLLM, OpenAI, Anthropic
- Audio: PipeWire (
pw-record,pw-dump), ffmpeg (Opus encoding)
- Architecture — System overview, data flows, state management design
- API Reference — Complete REST and WebSocket API documentation
- Development Guide — Setup, project structure, adding providers, building
- Troubleshooting — Common issues and solutions
The app is feature-complete with packaging support (Phases 0-8). The Tauri shell manages the Python backend lifecycle automatically — single-command dev workflow and distributable AppImage/deb builds.
MIT