A professional, high-fidelity audio recording utility for macOS, designed to capture real-time voice conversations from any application using the native ScreenCaptureKit and AVFoundation frameworks.
Read this in Chinese: README_CN.md
- Dual-Track Recording: Simultaneously captures system audio (remote voice) and microphone input (local voice).
- Automatic Merging: Intelligently mixes both tracks into a single high-quality audio file post-recording.
- Flexible App Selection: Supports recording from any application, with intelligent filtering for seamless capture.
- Native Performance: Built with SwiftUI and ScreenCaptureKit for optimal performance and low CPU overhead.
- Theme Mode: System (Auto) / Light / Dark appearance selection in Settings.
- Privacy-First: Operates locally on your machine with clear permission handling.
- Meeting Minutes (Multi-Provider ASR): Support for Alibaba Cloud Tingwu and Volcengine (ByteDance) to transcribe audio and generate structured minutes (summary, key points, action items), with Markdown export.
- Storage Backends (SQLite/MySQL): Store history locally or in MySQL, with optional local-to-MySQL sync.
- Email Notifications: Send meeting summaries directly to your email via a configured gateway.
See CHANGELOG.md for full release history.
- OS: macOS 13.0 (Ventura) or later.
- Hardware: Any Mac supporting macOS 13.0+.
- Development: Xcode 14.1+ for building and signing.
flowchart LR
subgraph Local["Local (macOS app)"]
A["AudioRecorder<br/>ScreenCaptureKit + AVFoundation"] --> B["Merge Tracks<br/>remote + mic"]
B --> C["MeetingTask"]
C --> D["MeetingPipelineManager<br/>State machine"]
D -->|Upload Raw| F1["OSSService<br/>Upload Original"]
D -->|Transcode| E["AVAssetExportSession<br/>mixed_48k.m4a"]
D -->|Upload| F2["OSSService<br/>Upload Mixed"]
D -->|Persist| SM["StorageManager<br/>StorageProvider"]
D -->|Config| S["SettingsStore"]
S --> K["KeychainHelper<br/>Credentials"]
SM -->|Local| J1["SQLiteStorage<br/>SQLite"]
SM -->|Remote| J2["MySQLStorage<br/>mysql-kit"]
J1 --> V["SwiftUI Views<br/>SettingsView / PipelineView / ResultView"]
J2 --> V
end
subgraph Cloud["Cloud Services"]
F1 --> G["OSS Bucket<br/>Public URL"]
F2 --> G
TS["TranscriptionService<br/>(Protocol)"]
T1["Alibaba Tingwu"]
T2["Volcengine"]
TS -.-> T1
TS -.-> T2
end
D -->|Upload| F1
D -->|Upload| F2
G -->|FileUrl| TS
TS -->|Result| D
Sources/: Core Swift implementation.Package.swift: Swift Package Manager configuration.package_app.sh: Automated build and ad-hoc signing script.Info.plist: Application configuration and permission strings.
Due to macOS security requirements (ScreenCaptureKit needs specific entitlements and signing), we provide a convenience script for local execution:
chmod +x package_app.sh
./package_app.sh
open VoiceMemo.appWhen you first start recording, macOS will request the following permissions:
- Screen Recording: Required by ScreenCaptureKit to capture system/app audio.
- Microphone: Required to capture your own voice.
Please grant these permissions in System Settings > Privacy & Security.
Open Settings in the app and configure:
General:
- Theme: System (Auto) / Light / Dark
- OSS Configuration (Required for file hosting): Alibaba Cloud AccessKeyId / AccessKeySecret, bucket, region, prefix
ASR Provider (Choose one):
- Alibaba Tingwu: AppKey
- Volcengine: AppId, AccessToken, ResourceId (supports auto format inference)
In Settings > Email, you can enable email notifications to receive meeting summaries:
- Gateway URL: The endpoint of your email sending service (e.g., FastMail gateway).
- API Token: The authentication token for the gateway.
- Recipient: The email address to receive the summaries.
Recorded audio files are saved to:
- Default:
~/Downloads/VoiceMemoRecordings/ - Configurable: You can change the save path in Settings > General.
Filenames:
recording-<timestamp>-remote.m4a: Remote/system audio.recording-<timestamp>-local.m4a: Local microphone audio.recording-<timestamp>-mixed.m4a: Merged conversation (Mixed mode).
Imported audio files are copied into the app sandbox:
~/Library/Application Support/VoiceMemo/recordings/(filename:<uuid>.<ext>)
After a recording completes, the latest task appears in the pipeline UI. Trigger the steps manually:
- Transcode → Upload → Create Task → Refresh Status
- View Result → Export Markdown
Use the sidebar action Import Audio to create a meeting task from an existing audio file, then run the same pipeline steps as above.
For detailed import instructions, please refer to the Audio Import Guide.
To open the project in Xcode for debugging:
python3 generate_project.py
xed VoiceMemo.xcodeprojEnsure you configure Signing & Capabilities with your Development Team to run the app with full permissions.
- Dual-track recording (Remote + Local)
- Automatic audio merging
- Audio Import (Support for external files)
- Multi-provider ASR architecture (Tingwu + Volcengine)
- Alibaba Cloud Tingwu offline transcription + minutes generation
- Volcengine ASR integration (V3 API)
- OSS upload integration
- Manual pipeline UI (transcode/upload/create/poll)
- MySQL storage backend with local-to-remote sync
- Unified error handling for ASR services
- Email notifications for meeting summaries
- Speaker diarization (cloud-based)
- Real-time transcription UI
- Auto-pipeline execution (fully automated workflow)
- Multi-language support for minutes generation
- Advanced audio editing capabilities
