Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 81 additions & 20 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,37 +5,55 @@

## Overview

macOS menu bar companion app. Lives entirely in the macOS status bar (no dock icon, no main window). Clicking the menu bar icon opens a custom floating panel with companion voice controls. Uses push-to-talk (ctrl+option) to capture voice input, transcribes it via AssemblyAI streaming, and sends the transcript + a screenshot of the user's screen to Claude. Claude responds with text (streamed via SSE) and voice (ElevenLabs TTS). A blue cursor overlay can fly to and point at UI elements Claude references on any connected monitor.
macOS menu bar companion app. Lives entirely in the macOS status bar (no dock icon, no main window). Clicking the menu bar icon opens a custom floating panel with companion voice controls. Uses push-to-talk (ctrl+option) to capture voice input, transcribes it via AssemblyAI streaming, and sends the transcript + a screenshot of the user's screen to Claude. Claude responds with text (streamed) and voice (ElevenLabs TTS). A blue cursor overlay can fly to and point at UI elements Claude references on any connected monitor.

All API keys live on a Cloudflare Worker proxy — nothing sensitive ships in the app.
Claude runs **locally** via the user's installed Claude Code CLI, authenticated against their Claude Max subscription. AssemblyAI and ElevenLabs API keys live on a Cloudflare Worker proxy. No Anthropic API key is required anywhere.

## Architecture

- **App Type**: Menu bar-only (`LSUIElement=true`), no dock icon or main window
- **Framework**: SwiftUI (macOS native) with AppKit bridging for menu bar panel and cursor overlay
- **Pattern**: MVVM with `@StateObject` / `@Published` state management
- **AI Chat**: Claude (Sonnet 4.6 default, Opus 4.6 optional) via Cloudflare Worker proxy with SSE streaming
- **Speech-to-Text**: AssemblyAI real-time streaming (`u3-rt-pro` model) via websocket, with OpenAI and Apple Speech as fallbacks
- **AI Chat**: Claude (Sonnet 4.6 default, Opus 4.6 optional) via the local `claude` CLI subprocess (Claude Max subscription quota)
- **Speech-to-Text**: AssemblyAI real-time streaming (`u3-rt-pro` model) via websocket, with Apple Speech as the local fallback
- **Text-to-Speech**: ElevenLabs (`eleven_flash_v2_5` model) via Cloudflare Worker proxy
- **Screen Capture**: ScreenCaptureKit (macOS 14.2+), multi-monitor support
- **Voice Input**: Push-to-talk via `AVAudioEngine` + pluggable transcription-provider layer. System-wide keyboard shortcut via listen-only CGEvent tap.
- **Element Pointing**: Claude embeds `[POINT:x,y:label:screenN]` tags in responses. The overlay parses these, maps coordinates to the correct monitor, and animates the blue cursor along a bezier arc to the target.
- **Concurrency**: `@MainActor` isolation, async/await throughout
- **Analytics**: PostHog via `ClickyAnalytics.swift`
- **Analytics**: None — the PostHog integration was removed in this fork. `ClickyAnalytics.swift` remains as a no-op shim so call sites compile unchanged.
- **Auto-update**: None — Sparkle was removed in this fork. Update by re-building from source.

### API Proxy (Cloudflare Worker)

The app never calls external APIs directly. All requests go through a Cloudflare Worker (`worker/src/index.ts`) that holds the real API keys as secrets.
AssemblyAI and ElevenLabs both go through a Cloudflare Worker (`worker/src/index.ts`) that holds their API keys as secrets. Claude does **not** — it runs as a local subprocess (see "Claude via local CLI" below).

| Route | Upstream | Purpose |
|-------|----------|---------|
| `POST /chat` | `api.anthropic.com/v1/messages` | Claude vision + streaming chat |
| `POST /tts` | `api.elevenlabs.io/v1/text-to-speech/{voiceId}` | ElevenLabs TTS audio |
| `POST /transcribe-token` | `streaming.assemblyai.com/v3/token` | Fetches a short-lived (480s) AssemblyAI websocket token |

Worker secrets: `ANTHROPIC_API_KEY`, `ASSEMBLYAI_API_KEY`, `ELEVENLABS_API_KEY`
Worker secrets: `ASSEMBLYAI_API_KEY`, `ELEVENLABS_API_KEY`
Worker vars: `ELEVENLABS_VOICE_ID`

### Claude via local CLI

`ClaudeAgentRunner.swift` spawns the locally installed `claude` binary as
a subprocess in stream-json mode for each push-to-talk request. The user
message — including the screenshot(s) as base64 image content blocks —
is written to the subprocess's stdin, and the streaming JSON output is
parsed for `text_delta` chunks the same way the SSE stream was parsed
before.

Binary discovery checks `ClaudeBinaryPath` in `Info.plist` first, then
common install locations (`~/.claude/local/claude`, `/opt/homebrew/bin/claude`,
`/usr/local/bin/claude`, `~/.local/bin/claude`, `~/.npm-global/bin/claude`,
`/usr/bin/claude`), and finally falls back to `command -v claude` in a
login shell so non-standard installs (nvm/asdf/mise) still work.

The subprocess runs with `--permission-mode plan` so Claude cannot
invoke tools that modify the user's filesystem.

### Key Architecture Decisions

**Menu Bar Panel Pattern**: The companion panel uses `NSStatusItem` for the menu bar icon and a custom borderless `NSPanel` for the floating control panel. This gives full control over appearance (dark, rounded corners, custom shadow) and avoids the standard macOS menu/popover chrome. The panel is non-activating so it doesn't steal focus. A global event monitor auto-dismisses it on outside clicks.
Expand All @@ -52,29 +70,26 @@ Worker vars: `ELEVENLABS_VOICE_ID`

| File | Lines | Purpose |
|------|-------|---------|
| `leanring_buddyApp.swift` | ~89 | Menu bar app entry point. Uses `@NSApplicationDelegateAdaptor` with `CompanionAppDelegate` which creates `MenuBarPanelManager` and starts `CompanionManager`. No main window — the app lives entirely in the status bar. |
| `CompanionManager.swift` | ~1026 | Central state machine. Owns dictation, shortcut monitoring, screen capture, Claude API, ElevenLabs TTS, and overlay management. Tracks voice state (idle/listening/processing/responding), conversation history, model selection, and cursor visibility. Coordinates the full push-to-talk → screenshot → Claude → TTS → pointing pipeline. |
| `leanring_buddyApp.swift` | ~50 | Menu bar app entry point. Uses `@NSApplicationDelegateAdaptor` with `CompanionAppDelegate` which creates `MenuBarPanelManager` and starts `CompanionManager`. No main window — the app lives entirely in the status bar. |
| `CompanionManager.swift` | ~990 | Central state machine. Owns dictation, shortcut monitoring, screen capture, Claude API, ElevenLabs TTS, and overlay management. Tracks voice state (idle/listening/processing/responding), conversation history, model selection, and cursor visibility. Coordinates the full push-to-talk → screenshot → Claude → TTS → pointing pipeline. |
| `MenuBarPanelManager.swift` | ~243 | NSStatusItem + custom NSPanel lifecycle. Creates the menu bar icon, manages the floating companion panel (show/hide/position), installs click-outside-to-dismiss monitor. |
| `CompanionPanelView.swift` | ~761 | SwiftUI panel content for the menu bar dropdown. Shows companion status, push-to-talk instructions, model picker (Sonnet/Opus), permissions UI, DM feedback button, and quit button. Dark aesthetic using `DS` design system. |
| `CompanionPanelView.swift` | ~705 | SwiftUI panel content for the menu bar dropdown. Shows companion status, push-to-talk instructions, model picker (Sonnet/Opus), permissions UI, DM feedback button, and quit button. Dark aesthetic using `DS` design system. |
| `OverlayWindow.swift` | ~881 | Full-screen transparent overlay hosting the blue cursor, response text, waveform, and spinner. Handles cursor animation, element pointing with bezier arcs, multi-monitor coordinate mapping, and fade-out transitions. |
| `CompanionResponseOverlay.swift` | ~217 | SwiftUI view for the response text bubble and waveform displayed next to the cursor in the overlay. |
| `CompanionScreenCaptureUtility.swift` | ~132 | Multi-monitor screenshot capture using ScreenCaptureKit. Returns labeled image data for each connected display. |
| `BuddyDictationManager.swift` | ~866 | Push-to-talk voice pipeline. Handles microphone capture via `AVAudioEngine`, provider-aware permission checks, keyboard/button dictation sessions, transcript finalization, shortcut parsing, contextual keyterms, and live audio-level reporting for waveform feedback. |
| `BuddyTranscriptionProvider.swift` | ~100 | Protocol surface and provider factory for voice transcription backends. Resolves provider based on `VoiceTranscriptionProvider` in Info.plist — AssemblyAI, OpenAI, or Apple Speech. |
| `BuddyTranscriptionProvider.swift` | ~75 | Protocol surface and provider factory for voice transcription backends. Resolves provider based on `VoiceTranscriptionProvider` in Info.plist — AssemblyAI (primary) or Apple Speech (local fallback). |
| `AssemblyAIStreamingTranscriptionProvider.swift` | ~478 | Streaming transcription provider. Fetches temp tokens from the Cloudflare Worker, opens an AssemblyAI v3 websocket, streams PCM16 audio, tracks turn-based transcripts, and delivers finalized text on key-up. Shares a single URLSession across all sessions. |
| `OpenAIAudioTranscriptionProvider.swift` | ~317 | Upload-based transcription provider. Buffers push-to-talk audio locally, uploads as WAV on release, returns finalized transcript. |
| `AppleSpeechTranscriptionProvider.swift` | ~147 | Local fallback transcription provider backed by Apple's Speech framework. |
| `BuddyAudioConversionSupport.swift` | ~108 | Audio conversion helpers. Converts live mic buffers to PCM16 mono audio and builds WAV payloads for upload-based providers. |
| `BuddyAudioConversionSupport.swift` | ~108 | Audio conversion helpers. Converts live mic buffers to PCM16 mono audio. |
| `GlobalPushToTalkShortcutMonitor.swift` | ~132 | System-wide push-to-talk monitor. Owns the listen-only `CGEvent` tap and publishes press/release transitions. |
| `ClaudeAPI.swift` | ~291 | Claude vision API client with streaming (SSE) and non-streaming modes. TLS warmup optimization, image MIME detection, conversation history support. |
| `OpenAIAPI.swift` | ~142 | OpenAI GPT vision API client. |
| `ClaudeAgentRunner.swift` | ~295 | Local `claude` CLI subprocess driver. Same public surface as the previous `ClaudeAPI` (so call sites are unchanged), but spawns the locally installed Claude Code binary in `--input-format stream-json --output-format stream-json --include-partial-messages` mode and parses the streamed `text_delta` events. Authenticates via the user's Claude Max subscription. |
| `ElevenLabsTTSClient.swift` | ~81 | ElevenLabs TTS client. Sends text to the Worker proxy, plays back audio via `AVAudioPlayer`. Exposes `isPlaying` for transient cursor scheduling. |
| `ElementLocationDetector.swift` | ~335 | Detects UI element locations in screenshots for cursor pointing. |
| `DesignSystem.swift` | ~880 | Design system tokens — colors, corner radii, shared styles. All UI references `DS.Colors`, `DS.CornerRadius`, etc. |
| `ClickyAnalytics.swift` | ~121 | PostHog analytics integration for usage tracking. |
| `ClickyAnalytics.swift` | ~55 | No-op analytics shim. The PostHog integration was removed in this fork; the functions remain so call sites compile unchanged. |
| `WindowPositionManager.swift` | ~262 | Window placement logic, Screen Recording permission flow, and accessibility permission helpers. |
| `AppBundleConfiguration.swift` | ~28 | Runtime configuration reader for keys stored in the app bundle Info.plist. |
| `worker/src/index.ts` | ~142 | Cloudflare Worker proxy. Three routes: `/chat` (Claude), `/tts` (ElevenLabs), `/transcribe-token` (AssemblyAI temp token). |
| `worker/src/index.ts` | ~110 | Cloudflare Worker proxy. Two routes: `/tts` (ElevenLabs) and `/transcribe-token` (AssemblyAI temp token). |

## Build & Run

Expand All @@ -97,7 +112,6 @@ cd worker
npm install

# Add secrets
npx wrangler secret put ANTHROPIC_API_KEY
npx wrangler secret put ASSEMBLYAI_API_KEY
npx wrangler secret put ELEVENLABS_API_KEY

Expand All @@ -108,6 +122,14 @@ npx wrangler deploy
npx wrangler dev
```

## Claude Code (local)

Clicky drives the locally installed `claude` CLI for AI responses.
Install it once from <https://claude.com/claude-code>, run `claude`
to authenticate against the user's Claude Max subscription, and then
Clicky finds the binary automatically. Override the binary path via
the `ClaudeBinaryPath` key in `Info.plist` if needed.

## Code Style & Conventions

### Variable and Method Naming
Expand Down Expand Up @@ -165,3 +187,42 @@ When you make changes to this project that affect the information in this file,
6. **Line count drift**: If a file's line count changes significantly (>50 lines), update the approximate count in the Key Files table

Do NOT update this file for minor edits, bug fixes, or changes that don't affect the documented architecture or conventions.

## Fork-specific changes (data-safety cleanup)

This fork intentionally drops several pieces of the upstream project to keep
the app's data flows narrow and obvious. If you re-add any of these you are
changing what the app sends to third parties — be deliberate about it.

- **PostHog analytics removed.** Upstream sent full push-to-talk transcripts
and full Claude responses (plus the user's email after onboarding) to
PostHog. The SDK, Swift Package dependency, and all `capture()` /
`identify()` calls were stripped. `ClickyAnalytics.swift` is kept as a
no-op shim so call sites compile.
- **Sparkle auto-update removed.** Upstream's `SUFeedURL` pointed at an
unrelated GitHub account (`julianjear/makesomething-mac-app`) that could
have shipped arbitrary signed updates. The Sparkle dependency, the
appcast feed keys in `Info.plist`, and the release pipeline that wrote
to that repo (`scripts/release.sh`) were all removed. Update by
re-building from source.
- **FormSpark email submission removed.** The onboarding email field used
to POST to `submit-form.com`. The submission, the field, and the
`hasSubmittedEmail` gate were removed; the Start button is shown
directly once permissions are granted.
- **Auto-login-item registration removed.** Upstream silently registered
itself with `SMAppService` on every launch. Now the app only runs when
you start it; add it to Login Items via System Settings if you want
it to auto-start.
- **Direct-API code paths removed.** `OpenAIAPI.swift`,
`ElementLocationDetector.swift`, and `OpenAIAudioTranscriptionProvider.swift`
could have been configured to call OpenAI / Anthropic directly with
in-bundle API keys, bypassing the Cloudflare Worker. All three were
deleted to enforce the invariant that no third-party API keys ship in
the app binary.
- **Claude moved off the Anthropic API onto the local CLI.** Instead of
proxying requests through the Worker to `api.anthropic.com`, Clicky
now spawns the user's locally installed `claude` binary
(`ClaudeAgentRunner.swift`) and pipes a stream-json user message into
it. This lets the Claude Max subscription cover the cost — no
pay-per-token API access is needed. The Worker's `/chat` route and the
`ANTHROPIC_API_KEY` secret were removed.
Loading