Skip to content

Latest commit

 

History

History
210 lines (143 loc) · 8.46 KB

File metadata and controls

210 lines (143 loc) · 8.46 KB

Screenshots

QwenVoice screenshot

Overview

QwenVoice is the public repository for an offline Apple-platform Qwen3-TTS app with Custom Voice, Voice Design, and Voice Cloning.

The currently shipped public release is QwenVoice v1.2.3 for macOS. The next macOS release ships under the Vocello app name while this repository, shared core, and many internal modules keep the QwenVoice identity for continuity.

Version Matrix

Surface Public name Artifact Minimum OS / tools Runtime Status
Shipped release QwenVoice v1.2.3 Assets attached to the v1.2.3 GitHub Release See the v1.2.3 release notes SwiftUI app with the shipped legacy runtime Current public download
Current main QwenVoice repo / Vocello app Local builds produce Vocello.app macOS 26.0+, iOS 26.0+, Xcode 26.0 Native Swift/MLX shared core with macOS XPC isolation and iPhone extension isolation Active development
Next macOS release Vocello Vocello-macos26.dmg macOS 26.0+ Native Swift/MLX macOS runtime hosted out of process Current release target
iPhone track Vocello for iPhone App Store / TestFlight only iOS 26.0+, iPhone 15 Pro minimum target 4-bit Speed variants in an engine extension Maintained but deferred

The public landing page remains QwenVoice-led until the Vocello-branded macOS release ships. iPhone is in active development, but it is not a public release surface for the current macOS-first milestone.

Shipped Modes

Custom Voice

Generate speech with the app's built-in English speakers:

  • Ryan
  • Aiden
  • Serena
  • Vivian

Voice Design

Voice Design is a standalone destination. Describe the voice you want, then shape tone before generating.

Voice Cloning

Clone a voice from a short reference clip. The app accepts WAV, MP3, AIFF, M4A, FLAC, and OGG input and can also use an optional transcript for better cloning accuracy. Only clone voices you own or have permission to use.

What the App Does Not Expose

  • no temperature or max-token controls
  • no streaming batch UI

Single-generation flows use live streaming preview and sidebar playback. Batch generation remains sequential and final-file-based.

Features

  • Native model downloads from Hugging Face
  • Live streaming preview for single generations
  • Local generation history stored in SQLite via GRDB
  • Batch generation for multi-line jobs
  • Sidebar waveform playback UI
  • Configurable output directory and autoplay preference
  • macOS XPC process isolation for native generation on current main
  • iPhone engine-extension isolation for the deferred iPhone track

Requirements

Current main

Requirement Detail
macOS 26.0+
iOS 26.0+ for the maintained iPhone targets
Chip Apple Silicon
RAM 8 GB+ on macOS; iPhone 15 Pro is the stated iPhone floor
Tools Xcode 26.0 and XcodeGen

Shipped QwenVoice v1.2.3

The shipped v1.2.3 build predates the current native main release track. Use the v1.2.3 GitHub Release notes and attached assets as the source of truth for that historical build.

Install from GitHub Releases

Download the current public release from Releases.

For the next macOS release, the public artifact is expected to be:

  • Vocello-macos26.dmg

Then:

  1. Open the DMG.
  2. Drag Vocello.app to /Applications.
  3. Open the app, go to Models, download a model, and generate speech.

Models

Static model metadata comes from Sources/Resources/qwenvoice_contract.json.

Mode 8-bit Quality folder 4-bit Speed folder Hugging Face repos
Custom Voice Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit Qwen3-TTS-12Hz-1.7B-CustomVoice-4bit mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-*
Voice Design Qwen3-TTS-12Hz-1.7B-VoiceDesign-8bit Qwen3-TTS-12Hz-1.7B-VoiceDesign-4bit mlx-community/Qwen3-TTS-12Hz-1.7B-VoiceDesign-*
Voice Cloning Qwen3-TTS-12Hz-1.7B-Base-8bit Qwen3-TTS-12Hz-1.7B-Base-4bit mlx-community/Qwen3-TTS-12Hz-1.7B-Base-*

On macOS, the Models screen exposes both Speed / 4-bit and Quality / 8-bit rows for each generation mode, for six downloadable rows total. The app marks the hardware-recommended row and the currently active row separately: 8 GB/floor Macs recommend and default to Speed, while mid/high-memory Macs recommend and default to Quality. Users can install both folders side by side and switch the active installed variant per mode with Use.

Downloads come directly from the Hugging Face repos listed above through the contract metadata. iPhone remains Speed-only and exposes the 4-bit rows for its supported model catalog.

Building from Source

Source-build prerequisites for current main:

  • macOS 26.0+
  • Apple Silicon
  • Xcode 26.0
  • XcodeGen
git clone https://github.com/PowerBeef/QwenVoice.git
cd QwenVoice
./scripts/regenerate_project.sh
open QwenVoice.xcodeproj

Build the QwenVoice scheme from Xcode, or use:

xcodebuild -project QwenVoice.xcodeproj -scheme QwenVoice build

Useful local checks:

./scripts/check_project_inputs.sh
./scripts/qa.sh validate
./scripts/qa.sh test --layer contract
./scripts/qa.sh test --layer swift
./scripts/qa.sh test --layer native
./scripts/build_foundation_targets.sh macos
./scripts/build_foundation_targets.sh ios

qa.sh stores isolated build products and .xcresult bundles under build/harness/. The macOS UI smoke lane is available with ./scripts/qa.sh test --layer e2e; for release signoff on a controlled Mac, run it strictly with QWENVOICE_E2E_STRICT=1.

Performance and audio-QC validation runs through the opt-in perf lane (requires installed models, not part of --layer all):

./scripts/qa.sh test --layer perf

Local Release Packaging

For a local unsigned macOS release build and DMG:

./scripts/release.sh --preflight full
./scripts/verify_release_bundle.sh build/Vocello.app
./scripts/verify_packaged_dmg.sh build/Vocello-macos26.dmg build/release-metadata.txt

Tone and Emotion Control

Custom Voice and Voice Design are guided by natural-language instructions rather than SSML-style sliders or markup.

See docs/qwen_tone.md for app-oriented guidance on tone and prompt writing.

Architecture

Current main uses a native Apple-platform architecture:

  • Sources/ contains the macOS app shell, shared app models/services/views, and the shipping Mac target.
  • Sources/QwenVoiceCore/ contains shared Apple-platform runtime semantics, contract types, model variants, and iOS extension transport.
  • Sources/QwenVoiceNative/ contains the macOS app-facing engine proxy/store/client layer.
  • Sources/QwenVoiceEngineSupport/ contains shared macOS engine IPC and transport types.
  • Sources/QwenVoiceEngineService/ contains the bundled macOS XPC helper.
  • Sources/iOS/, Sources/iOSSupport/, and Sources/iOSEngineExtension/ contain the deferred iPhone app, support layer, and isolated engine extension.
  • Sources/SharedSupport/ contains shared playback and generation-persistence surfaces.

The current codebase does not maintain a repo-owned Python backend, Python setup path, or standalone CLI surface.

Default macOS runtime data layout:

~/Library/Application Support/QwenVoice/
  models/
  outputs/
    CustomVoice/
    VoiceDesign/
    Clones/
  voices/
  history.sqlite

See docs/reference/privacy-storage.md for local storage, privacy, and deletion details.

More Docs

License

QwenVoice is available under the MIT License.

Credits

QwenVoice builds on: