LogNotes is a lightweight, local speech-to-text application that transcribes your recorded notes and pastes the result wherever your cursor is placed. It's primarily designed to "log" short notes. I use it quite often when instructing coding agents (e.g, when providing feedback, describing bugs, or outlining requirements).
The app uses Whisper for transcription and Ollama for optional grammar cleanup. NVIDIA Parakeet (via ONNX Runtime) is supported as an opt-in alternative. The current version is still very much a work in progress, but I'll definitely be working on further improvements.
I built LogNotes because I wanted a free, fully local alternative to Whispr Flow. I wanted something I could run on-device, without needing a subscription. The goal was to create the same core experience, even if it was a bit slower, and process voice recordings locally.
When looking into open source solutions, I came across Handy by @cjpais. I used this app's code as a reference for several optimizations to make LogNotes faster and extensible. I would definitely recommend checking out this app as well.
- Python 3.10+
- Ollama (optional)
cd LogNotes
python -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activatepip install -r requirements.txt# Install Ollama from https://ollama.ai
# Then pull the model:
ollama pull llama3.2:1bThe repo includes a pre-built Windows app at dist/LogNotes/LogNotes.exe. If you've downloaded the code you can run it directly without a Python install.
For build instructions, platform-specific notes, and Mac setup see documentation/desktopAppConfiguration.md.
- Transcription Speed: Typically bigger speech-to-text models will take a bit longer to transcribe the text, but they have better accuracy. I have optimized the speed as much as I can. It should be reasonably fast even on CPUs.
- First Transcription: The very first transcription may be slow since the speech detection tool has to start up. You may notice the same difference in speed after restarting the app.
- Hotkeys: If you're using LogNotes primarily on a specific app (e.g., Cursor, Obsidian, Jira), make sure your hotkey doesn't conflict with any existing shortcuts in those apps.
- Windows App Builds: If you make changes to the code and rebuild the Windows app it will take several minutes (just fyi).
- Antivirus Scans: The first time you run the Windows app your antivirus software may need to scan it before you can use it. Once the scan is over just reopen the app and it should work as expected.
- Local Processing - All transcription happens on your machine. Silence is automatically filtered out.
- Flexible Recording Modes - Choose between Hold mode (press and hold) or Toggle mode (click to start/stop).
- Grammar Cleanup - Optional post-processing with local LLM (Ollama). When this is enabled the transcription speed may be noticeably lower.
- Whisper Transcription - Choose between Whisper tiny / base / small. CUDA is auto-detected and used when available.
- Session Activity Tab - Every transcription this session is retained in RAM so you can retry it with a different model, copy the text, or delete it.
- Always-Visible Recording Overlay - Small borderless status indicator pinned to a screen corner; drag to reposition, right-click to cycle corners.
- Checkpoint Pasting - Sentences are pasted as soon as Whisper finishes each one, so partial text is preserved if processing fails mid-stream
- No Audio Storage - Recordings are held in memory only during the app session and never written to disk. The Activity tab keeps recent clips in RAM so you can retry a transcription with a different model. Everything is cleared on app close
- Model Name Validation - Ollama model names are validated against an allowed-characters pattern at both config load and runtime model switches.
- Config Validation - All configuration values are validated against whitelists on load; the Ollama host URL is verified to have a valid scheme and non-empty hostname.
- Atomic Config Permissions - The config file is created with
0o600permissions in a singleos.open()call, with no readable window between creation andchmod. - Bounded Activity Memory - The session audio cap is enforced before adding each new entry, preventing a single long recording from temporarily spiking RAM past the limit.
- Pinned Model Versions - External model downloads use pinned versions.
python main.pyHold Mode (default):
- Open any text editor or input field where you want to paste text
- Press and hold the hotkey (default:
Ctrl+Shift+D) - Speak your text
- Release the hotkey
- Wait for processing - the transcribed text will be pasted at your cursor
Toggle Mode:
- Open any text editor or input field where you want to paste text
- Press the hotkey once to start recording
- Speak your text
- Press the hotkey again to stop recording
- Wait for processing - the transcribed text will be pasted at your cursor
LogNotes/
├── main.py # Entry point and controller
├── requirements.txt # Python dependencies
├── src/
│ ├── paths.py # User data / cache dir + bundled-asset resolution
│ ├── audio/
│ │ ├── recorder.py # Microphone recording (sounddevice)
│ │ └── vad.py # Voice activity detection (Silero)
│ ├── transcription/
│ │ ├── registry.py # Model registry (id → display → backend)
│ │ ├── base.py # Transcriber protocol
│ │ ├── device.py # CUDA detection (ctranslate2 + onnxruntime)
│ │ ├── whisper.py # Whisper backend (faster-whisper)
│ │ └── parakeet.py # Parakeet backend (onnx-asr / ONNX Runtime)
│ ├── processing/
│ │ └── grammar.py # Grammar cleanup (Ollama)
│ ├── input/
│ │ ├── hotkey.py # Global hotkey listener (pynput)
│ │ └── paster.py # Text pasting utility
│ └── ui/
│ ├── app.py # ttkbootstrap GUI + log viewer + overlay
│ └── activity.py # Session activity store and Activity tab
├── build/ # PyInstaller specs + Inno Setup + build.ps1
└── documentation/
├── configuration.md # Config file, schema, settings, validation
├── troubleshooting.md # Common issues and fixes
├── desktopAppConfiguration.md # Desktop packaging details
└── mvpImplementation.md # Architecture and implementation details
| Component | Library |
|---|---|
| Transcription (default) | faster-whisper |
| Transcription (optional, opt-in) | onnx-asr + ONNX Runtime (Parakeet) |
| Voice Activity Detection | Silero VAD (via torch) |
| Audio Recording | sounddevice |
| Global Hotkeys | pynput |
| Text Pasting | pynput + pyperclip |
| Grammar Cleanup | Ollama Python client |
| UI | ttkbootstrap (modern themed Tkinter) |
- Configuration — Covers the config file location, full settings schema, valid values for each option, and the validation rules applied on load.
- Troubleshooting — Step-by-step fixes for common issues including hotkeys not firing, audio problems, transcription quality, Ollama connectivity, and packaged build failures.
- Desktop Packaging — Instructions for building the Windows
.exeand Mac.app, PyInstaller spec details, Inno Setup installer configuration, and runtime path layout. - Implementation — Deep dive into the architecture, component responsibilities, the checkpoint-pasting pipeline, security model, and known design decisions.
This project was built entirely with Claude Code and OpenAI Codex. While the code has been reviewed, AI-generated code can contain bugs or issues that are easy to miss. If you spot anything significant, please open an issue. I'd genuinely appreciate it.
MIT