Skip to content

YonganZhang/speakflow-for-human-vibe-coding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SpeakFlow Logo

SpeakFlow for Human Vibe Coding

Hold to talk. Release to transcribe. Human vibe coding, faster.

Offline voice input for writing, coding, prompting, and human vibe coding on Windows.

Part of the Yongan Toolkit for coding and academic research: Everything to MD for Agent · Gemini Workflows for Agents

Windows Python CUDA Offline License Stars

Quick Start · Features · Local API · How It Works · Yongan Toolkit · 中文 · 日本語 · 한국어


Why SpeakFlow

Most voice input tools are built for chat, not for real desktop work. SpeakFlow is optimized for direct text entry into whatever you are already doing:

  • writing papers, notes, prompts, and documentation
  • coding with hands mostly on mouse and keyboard
  • dictating in Chinese, English, Cantonese, Japanese, or Korean
  • working offline without sending audio to cloud services

The workflow is simple:

Hold trigger -> speak -> release -> text appears at the cursor

Features

  • Hold-to-record voice input with mouse or keyboard triggers
  • Auto-paste transcribed text into the active app
  • Offline multilingual ASR with SenseVoice
  • GPU acceleration with CUDA, plus CPU fallback
  • Browser-based setup wizard for first-time configuration
  • Tray app with status icons and quick controls
  • Startup integration for always-available dictation
  • Built-in doctor command for environment diagnostics
  • Optional emotion and event text output such as (情感:高兴)
  • Local HTTP API for other local projects
  • Runtime logs written to logs/speakflow.log

Quick Start

git clone https://github.com/YonganZhang/SpeakFlow.git
cd SpeakFlow
pip install -r speakflow/requirements.txt
python -m speakflow

On first launch:

  1. The setup wizard opens in your browser.
  2. Choose microphone, trigger key, and language settings.
  3. Save the configuration.
  4. The speech model downloads on first use.
  5. SpeakFlow stays in the system tray and is ready to use.

Common Commands

# Start the app
python -m speakflow

# Run environment diagnostics
python -m speakflow doctor

# Re-open the setup wizard
python -m speakflow setup

Requirements

  • Windows 10 or 11
  • Python 3.10+
  • NVIDIA GPU with CUDA recommended
  • CPU mode also works, but slower

Local API

When enabled, SpeakFlow exposes a local-only API for other apps on the same machine:

  • GET http://127.0.0.1:18360/health
  • POST http://127.0.0.1:18360/api/v1/transcribe

Form parameters:

Parameter Required Description
file Yes Audio file such as wav or mp3
language No auto, zh, en, yue, ja, ko
emotion_mode No text or off
event_mode No text or off

Response JSON:

Field Description
text Final formatted text
plain_text Plain transcription without appended labels
emotion Recognized emotion label
event Recognized event label
language Detected language
raw_text Raw SenseVoice output

Python example:

import requests

with open("test.wav", "rb") as f:
    resp = requests.post(
        "http://127.0.0.1:18360/api/v1/transcribe",
        files={"file": ("test.wav", f, "audio/wav")},
        data={
            "language": "auto",
            "emotion_mode": "text",
            "event_mode": "off",
        },
        timeout=120,
    )

print(resp.json()["text"])

Recommended reuse pattern:

  1. Start SpeakFlow once in the background.
  2. Let your other project send audio files to http://127.0.0.1:18360/api/v1/transcribe.
  3. Use text directly, or split into plain_text, emotion, and event.

Configuration

The default config is generated under ~/.speakflow/config.yaml.

trigger:
  type: mouse
  mouse_button: middle
  keyboard_hotkey: f9
  mode: hold

audio:
  sample_rate: 16000
  device: null

asr:
  model: "iic/SenseVoiceSmall"
  device: "cuda:0"
  language: auto

output:
  auto_paste: true
  notification: true
  emotion_mode: text
  event_mode: off

api:
  enabled: true
  host: 127.0.0.1
  port: 18360

How It Works

Trigger listener -> audio recorder -> SenseVoice ASR -> clipboard / paste output

Key components in this repo:

  • speakflow/input_trigger.py listens for mouse or keyboard triggers
  • speakflow/audio.py records microphone audio
  • speakflow/transcriber.py runs ASR via FunASR / SenseVoice
  • speakflow/local_api.py exposes a local HTTP API
  • speakflow/output.py handles clipboard and auto-paste
  • speakflow/setup_server.py powers the browser-based setup wizard

Use Cases

  • dictating research notes into Obsidian, Notion, or Word
  • speaking prompts into Claude, ChatGPT, Gemini, or coding agents
  • writing code comments, commit messages, and docs faster
  • low-friction multilingual text entry on Windows
  • reusing the ASR service from other local tools

Yongan Toolkit

This repo is one part of the Yongan Toolkit: a small collection of coding and research tools that work well together.

Project What It Helps With
speakflow-for-human-vibe-coding speak ideas, notes, and prompts directly into any Windows app
gemini-workflows-for-agents run Gemini-powered workflows for agents
everything-to-md-for-agent turn documents and equations into Markdown for agents

Recommended flow: draft with speakflow-for-human-vibe-coding, research with gemini-workflows-for-agents, then convert papers with everything-to-md-for-agent.

Project Structure

speakflow-for-human-vibe-coding/
├── speakflow/
│   ├── __main__.py
│   ├── app.py
│   ├── audio.py
│   ├── doctor.py
│   ├── input_trigger.py
│   ├── local_api.py
│   ├── output.py
│   ├── setup_server.py
│   ├── transcriber.py
│   └── resources/
├── build.ps1
├── install_startup.ps1
├── SpeakFlow.bat
├── SpeakFlow.vbs
└── README.md

中文说明

SpeakFlow 是一个更偏生产力场景的 Windows 离线语音输入工具,不是单纯的“语音转文字 demo”。

核心体验是:

  • 按住触发键开始录音
  • 松开后自动转写
  • 文本直接粘贴到当前光标位置

它现在还支持:

  • 本机 HTTP 接口,供其他项目直接调用
  • 情感/事件文字输出
  • 日志写入 logs/speakflow.log

最常用命令:

python -m speakflow
python -m speakflow doctor
python -m speakflow setup

日本語

SpeakFlow は、Windows 上で使えるオフライン音声入力ツールです。ボタンを押して話し、離すと文字起こしされてカーソル位置へ自動貼り付けされます。

  • オフライン音声入力
  • 中国語、英語、広東語、日本語、韓国語に対応
  • CUDA による高速推論
  • ローカル API で他のアプリから再利用可能

主なコマンド:

python -m speakflow
python -m speakflow doctor
python -m speakflow setup

한국어

SpeakFlow는 Windows에서 사용하는 오프라인 음성 입력 도구입니다. 버튼을 누른 채 말하고, 놓으면 텍스트로 변환되어 현재 커서 위치에 자동으로 붙여넣어집니다.

  • 오프라인 음성 입력
  • 중국어, 영어, 광둥어, 일본어, 한국어 지원
  • CUDA 기반 고속 추론
  • 로컬 API로 다른 프로젝트에서 재사용 가능

주요 명령:

python -m speakflow
python -m speakflow doctor
python -m speakflow setup

License

MIT. See LICENSE.

SenseVoice is developed by the FunAudioLLM team. Please also follow the upstream model license when using that model.

About

Offline voice input for human vibe coding. Hold to talk, release to transcribe, auto-paste on Windows.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors