Skip to content

A professional, high-fidelity audio recording utility for macOS, specifically designed to capture real-time WeChat voice conversations using the native ScreenCaptureKit and AVFoundation frameworks.

License

Notifications You must be signed in to change notification settings

mistbit/VoiceMemo

Repository files navigation

VoiceMemo

A professional, high-fidelity audio recording utility for macOS, designed to capture real-time voice conversations from any application using the native ScreenCaptureKit and AVFoundation frameworks.

Read this in Chinese: README_CN.md

Features

  • Dual-Track Recording: Simultaneously captures system audio (remote voice) and microphone input (local voice).
  • Automatic Merging: Intelligently mixes both tracks into a single high-quality audio file post-recording.
  • Flexible App Selection: Supports recording from any application, with intelligent filtering for seamless capture.
  • Native Performance: Built with SwiftUI and ScreenCaptureKit for optimal performance and low CPU overhead.
  • Theme Mode: System (Auto) / Light / Dark appearance selection in Settings.
  • Privacy-First: Operates locally on your machine with clear permission handling.
  • Meeting Minutes (Multi-Provider ASR): Support for Alibaba Cloud Tingwu and Volcengine (ByteDance) to transcribe audio and generate structured minutes (summary, key points, action items), with Markdown export.
  • Storage Backends (SQLite/MySQL): Store history locally or in MySQL, with optional local-to-MySQL sync.
  • Email Notifications: Send meeting summaries directly to your email via a configured gateway.

Changelog

See CHANGELOG.md for full release history.

Security and Audit

See Security and Audit Notes.

Requirements

  • OS: macOS 13.0 (Ventura) or later.
  • Hardware: Any Mac supporting macOS 13.0+.
  • Development: Xcode 14.1+ for building and signing.

Architecture

flowchart LR
  subgraph Local["Local (macOS app)"]
    A["AudioRecorder<br/>ScreenCaptureKit + AVFoundation"] --> B["Merge Tracks<br/>remote + mic"]
    B --> C["MeetingTask"]
    C --> D["MeetingPipelineManager<br/>State machine"]
    D -->|Upload Raw| F1["OSSService<br/>Upload Original"]
    D -->|Transcode| E["AVAssetExportSession<br/>mixed_48k.m4a"]
    D -->|Upload| F2["OSSService<br/>Upload Mixed"]
    D -->|Persist| SM["StorageManager<br/>StorageProvider"]
    D -->|Config| S["SettingsStore"]
    S --> K["KeychainHelper<br/>Credentials"]
    SM -->|Local| J1["SQLiteStorage<br/>SQLite"]
    SM -->|Remote| J2["MySQLStorage<br/>mysql-kit"]
    J1 --> V["SwiftUI Views<br/>SettingsView / PipelineView / ResultView"]
    J2 --> V
  end

  subgraph Cloud["Cloud Services"]
    F1 --> G["OSS Bucket<br/>Public URL"]
    F2 --> G
    TS["TranscriptionService<br/>(Protocol)"]
    T1["Alibaba Tingwu"]
    T2["Volcengine"]
    TS -.-> T1
    TS -.-> T2
  end

  D -->|Upload| F1
  D -->|Upload| F2
  G -->|FileUrl| TS
  TS -->|Result| D
Loading

Project Structure

  • Sources/: Core Swift implementation.
  • Package.swift: Swift Package Manager configuration.
  • package_app.sh: Automated build and ad-hoc signing script.
  • Info.plist: Application configuration and permission strings.

Getting Started

1. Build and Run (recommended)

Due to macOS security requirements (ScreenCaptureKit needs specific entitlements and signing), we provide a convenience script for local execution:

chmod +x package_app.sh
./package_app.sh
open VoiceMemo.app

2. Permissions

When you first start recording, macOS will request the following permissions:

  • Screen Recording: Required by ScreenCaptureKit to capture system/app audio.
  • Microphone: Required to capture your own voice.

Please grant these permissions in System Settings > Privacy & Security.

3. Configure ASR Provider + OSS (optional)

Open Settings in the app and configure:

General:

  • Theme: System (Auto) / Light / Dark
  • OSS Configuration (Required for file hosting): Alibaba Cloud AccessKeyId / AccessKeySecret, bucket, region, prefix

ASR Provider (Choose one):

  • Alibaba Tingwu: AppKey
  • Volcengine: AppId, AccessToken, ResourceId (supports auto format inference)

4. Configure Email (optional)

In Settings > Email, you can enable email notifications to receive meeting summaries:

  • Gateway URL: The endpoint of your email sending service (e.g., FastMail gateway).
  • API Token: The authentication token for the gateway.
  • Recipient: The email address to receive the summaries.

5. Audio Outputs

Recorded audio files are saved to:

  • Default: ~/Downloads/VoiceMemoRecordings/
  • Configurable: You can change the save path in Settings > General.

Filenames:

  • recording-<timestamp>-remote.m4a: Remote/system audio.
  • recording-<timestamp>-local.m4a: Local microphone audio.
  • recording-<timestamp>-mixed.m4a: Merged conversation (Mixed mode).

Imported audio files are copied into the app sandbox:

  • ~/Library/Application Support/VoiceMemo/recordings/ (filename: <uuid>.<ext>)

6. Generate minutes

After a recording completes, the latest task appears in the pipeline UI. Trigger the steps manually:

  • Transcode → Upload → Create Task → Refresh Status
  • View Result → Export Markdown

7. Import audio (optional)

Use the sidebar action Import Audio to create a meeting task from an existing audio file, then run the same pipeline steps as above.

For detailed import instructions, please refer to the Audio Import Guide.

Development

To open the project in Xcode for debugging:

python3 generate_project.py
xed VoiceMemo.xcodeproj

Ensure you configure Signing & Capabilities with your Development Team to run the app with full permissions.

Roadmap

Completed

  • Dual-track recording (Remote + Local)
  • Automatic audio merging
  • Audio Import (Support for external files)
  • Multi-provider ASR architecture (Tingwu + Volcengine)
  • Alibaba Cloud Tingwu offline transcription + minutes generation
  • Volcengine ASR integration (V3 API)
  • OSS upload integration
  • Manual pipeline UI (transcode/upload/create/poll)
  • MySQL storage backend with local-to-remote sync
  • Unified error handling for ASR services
  • Email notifications for meeting summaries

Planned

  • Speaker diarization (cloud-based)
  • Real-time transcription UI
  • Auto-pipeline execution (fully automated workflow)
  • Multi-language support for minutes generation
  • Advanced audio editing capabilities

Contact

sherlock.yin1994@gmail.com

License

MIT License

About

A professional, high-fidelity audio recording utility for macOS, specifically designed to capture real-time WeChat voice conversations using the native ScreenCaptureKit and AVFoundation frameworks.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors