Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 12 additions & 12 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,35 +5,36 @@

## Overview

macOS menu bar companion app. Lives entirely in the macOS status bar (no dock icon, no main window). Clicking the menu bar icon opens a custom floating panel with companion voice controls. Uses push-to-talk (ctrl+option) to capture voice input, transcribes it via AssemblyAI streaming, and sends the transcript + a screenshot of the user's screen to Claude. Claude responds with text (streamed via SSE) and voice (ElevenLabs TTS). A blue cursor overlay can fly to and point at UI elements Claude references on any connected monitor.
macOS menu bar companion app. Lives entirely in the macOS status bar (no dock icon, no main window). Clicking the menu bar icon opens a custom floating panel with companion voice controls. Uses push-to-talk (ctrl+option) to capture voice input, transcribes it via AssemblyAI streaming, and sends the transcript + a screenshot of the user's screen to the selected AI provider. The app can use Codex through the local Codex CLI session or Claude through Anthropic Messages. The selected model responds with text and voice (ElevenLabs TTS). A blue cursor overlay can fly to and point at UI elements the model references on any connected monitor.

All API keys live on a Cloudflare Worker proxy — nothing sensitive ships in the app.
Claude, ElevenLabs, and AssemblyAI API keys live on a Cloudflare Worker proxy — nothing sensitive ships in the app. Codex uses the user's installed `codex` CLI and existing `codex login` session instead of a Worker API key.

## Architecture

- **App Type**: Menu bar-only (`LSUIElement=true`), no dock icon or main window
- **Framework**: SwiftUI (macOS native) with AppKit bridging for menu bar panel and cursor overlay
- **Pattern**: MVVM with `@StateObject` / `@Published` state management
- **AI Chat**: Claude (Sonnet 4.6 default, Opus 4.6 optional) via Cloudflare Worker proxy with SSE streaming
- **AI Chat**: User-selectable in the panel. Codex (`gpt-5.5` default, plus the other Codex CLI model options in `CompanionManager.swift`) runs through the local Codex CLI. Claude (`claude-sonnet-4-6` default, `claude-opus-4-6` optional) uses Anthropic Messages through the Cloudflare Worker proxy.
- **Speech-to-Text**: AssemblyAI real-time streaming (`u3-rt-pro` model) via websocket, with OpenAI and Apple Speech as fallbacks
- **Text-to-Speech**: ElevenLabs (`eleven_flash_v2_5` model) via Cloudflare Worker proxy
- **Screen Capture**: ScreenCaptureKit (macOS 14.2+), multi-monitor support
- **Voice Input**: Push-to-talk via `AVAudioEngine` + pluggable transcription-provider layer. System-wide keyboard shortcut via listen-only CGEvent tap.
- **Element Pointing**: Claude embeds `[POINT:x,y:label:screenN]` tags in responses. The overlay parses these, maps coordinates to the correct monitor, and animates the blue cursor along a bezier arc to the target.
- **Element Pointing**: The selected AI model embeds `[POINT:x,y:label:screenN]` tags in responses. The overlay parses these, maps coordinates to the correct monitor, and animates the blue cursor along a bezier arc to the target.
- **Concurrency**: `@MainActor` isolation, async/await throughout
- **Analytics**: PostHog via `ClickyAnalytics.swift`

### API Proxy (Cloudflare Worker)

The app never calls external APIs directly. All requests go through a Cloudflare Worker (`worker/src/index.ts`) that holds the real API keys as secrets.
Claude, ElevenLabs, and AssemblyAI requests go through a Cloudflare Worker (`worker/src/index.ts`) that holds the real API keys as secrets. Codex chat does not use the Worker; it runs `codex exec` locally.

| Route | Upstream | Purpose |
|-------|----------|---------|
| `POST /chat` | `api.anthropic.com/v1/messages` | Claude vision + streaming chat |
| `POST /chat/claude` | `api.anthropic.com/v1/messages` | Claude vision + streaming chat |
| `POST /tts` | `api.elevenlabs.io/v1/text-to-speech/{voiceId}` | ElevenLabs TTS audio |
| `POST /transcribe-token` | `streaming.assemblyai.com/v3/token` | Fetches a short-lived (480s) AssemblyAI websocket token |

Worker secrets: `ANTHROPIC_API_KEY`, `ASSEMBLYAI_API_KEY`, `ELEVENLABS_API_KEY`
Worker chat secrets: `ANTHROPIC_API_KEY` for Claude. Voice secrets: `ASSEMBLYAI_API_KEY`, `ELEVENLABS_API_KEY`
Worker vars: `ELEVENLABS_VOICE_ID`

### Key Architecture Decisions
Expand All @@ -53,9 +54,9 @@ Worker vars: `ELEVENLABS_VOICE_ID`
| File | Lines | Purpose |
|------|-------|---------|
| `leanring_buddyApp.swift` | ~89 | Menu bar app entry point. Uses `@NSApplicationDelegateAdaptor` with `CompanionAppDelegate` which creates `MenuBarPanelManager` and starts `CompanionManager`. No main window — the app lives entirely in the status bar. |
| `CompanionManager.swift` | ~1026 | Central state machine. Owns dictation, shortcut monitoring, screen capture, Claude API, ElevenLabs TTS, and overlay management. Tracks voice state (idle/listening/processing/responding), conversation history, model selection, and cursor visibility. Coordinates the full push-to-talk → screenshot → Claude → TTS → pointing pipeline. |
| `CompanionManager.swift` | ~1175 | Central state machine. Owns dictation, shortcut monitoring, screen capture, provider/model selection, Codex/Claude API clients, ElevenLabs TTS, and overlay management. Tracks voice state (idle/listening/processing/responding), conversation history, model selection, and cursor visibility. Coordinates the full push-to-talk → screenshot → selected AI provider → TTS → pointing pipeline. |
| `MenuBarPanelManager.swift` | ~243 | NSStatusItem + custom NSPanel lifecycle. Creates the menu bar icon, manages the floating companion panel (show/hide/position), installs click-outside-to-dismiss monitor. |
| `CompanionPanelView.swift` | ~761 | SwiftUI panel content for the menu bar dropdown. Shows companion status, push-to-talk instructions, model picker (Sonnet/Opus), permissions UI, DM feedback button, and quit button. Dark aesthetic using `DS` design system. |
| `CompanionPanelView.swift` | ~809 | SwiftUI panel content for the menu bar dropdown. Shows companion status, push-to-talk instructions, AI provider picker, provider-specific model picker, permissions UI, DM feedback button, and quit button. Dark aesthetic using `DS` design system. |
| `OverlayWindow.swift` | ~881 | Full-screen transparent overlay hosting the blue cursor, response text, waveform, and spinner. Handles cursor animation, element pointing with bezier arcs, multi-monitor coordinate mapping, and fade-out transitions. |
| `CompanionResponseOverlay.swift` | ~217 | SwiftUI view for the response text bubble and waveform displayed next to the cursor in the overlay. |
| `CompanionScreenCaptureUtility.swift` | ~132 | Multi-monitor screenshot capture using ScreenCaptureKit. Returns labeled image data for each connected display. |
Expand All @@ -66,15 +67,14 @@ Worker vars: `ELEVENLABS_VOICE_ID`
| `AppleSpeechTranscriptionProvider.swift` | ~147 | Local fallback transcription provider backed by Apple's Speech framework. |
| `BuddyAudioConversionSupport.swift` | ~108 | Audio conversion helpers. Converts live mic buffers to PCM16 mono audio and builds WAV payloads for upload-based providers. |
| `GlobalPushToTalkShortcutMonitor.swift` | ~132 | System-wide push-to-talk monitor. Owns the listen-only `CGEvent` tap and publishes press/release transitions. |
| `ClaudeAPI.swift` | ~291 | Claude vision API client with streaming (SSE) and non-streaming modes. TLS warmup optimization, image MIME detection, conversation history support. |
| `OpenAIAPI.swift` | ~142 | OpenAI GPT vision API client. |
| `CodexCLIAPI.swift` | ~225 | Local Codex CLI client. Writes screenshots to a temp directory, runs `codex exec` with the selected Codex model, and reads the final response from `--output-last-message`. |
| `ClaudeAPI.swift` | ~291 | Claude Messages API client with streaming (SSE) and non-streaming modes. TLS warmup optimization, image MIME detection, conversation history support. |
| `ElevenLabsTTSClient.swift` | ~81 | ElevenLabs TTS client. Sends text to the Worker proxy, plays back audio via `AVAudioPlayer`. Exposes `isPlaying` for transient cursor scheduling. |
| `ElementLocationDetector.swift` | ~335 | Detects UI element locations in screenshots for cursor pointing. |
| `DesignSystem.swift` | ~880 | Design system tokens — colors, corner radii, shared styles. All UI references `DS.Colors`, `DS.CornerRadius`, etc. |
| `ClickyAnalytics.swift` | ~121 | PostHog analytics integration for usage tracking. |
| `WindowPositionManager.swift` | ~262 | Window placement logic, Screen Recording permission flow, and accessibility permission helpers. |
| `AppBundleConfiguration.swift` | ~28 | Runtime configuration reader for keys stored in the app bundle Info.plist. |
| `worker/src/index.ts` | ~142 | Cloudflare Worker proxy. Three routes: `/chat` (Claude), `/tts` (ElevenLabs), `/transcribe-token` (AssemblyAI temp token). |
| `worker/src/index.ts` | ~135 | Cloudflare Worker proxy. Routes: `/chat` and `/chat/claude` (Anthropic Messages), `/tts` (ElevenLabs), `/transcribe-token` (AssemblyAI temp token). |

## Build & Run

Expand Down
48 changes: 35 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,23 +19,37 @@ Here's the [original tweet](https://x.com/FarzaTV/status/2041314633978659092) th

This is the open-source version of Clicky for those that want to hack on it, build their own features, or just see how it works under the hood.

## Get started with Claude Code
## Get started with an agent

The fastest way to get this running is with [Claude Code](https://docs.anthropic.com/en/docs/claude-code).
Clicky includes repo instructions for coding agents. Codex reads `AGENTS.md`; Claude Code reads `CLAUDE.md`, which points to the same instructions.

Once you get Claude running, paste this:
### Codex

Once you have [Codex CLI](https://github.com/openai/codex) running, paste this:

```
Clone https://github.com/farzaa/clicky.git into my current directory.

Then read AGENTS.md. I want to get Clicky running locally on my Mac.

Help me set up everything: the Cloudflare Worker with my own API keys, the proxy URLs, and getting it building in Xcode. Walk me through it.
```
Hi Claude.

Codex will use the root `AGENTS.md` for architecture, build constraints, and coding conventions. The nested `leanring-buddy/AGENTS.md` adds guidance for the native macOS app target.

### Claude Code

Once you have [Claude Code](https://docs.anthropic.com/en/docs/claude-code) running, paste this:

```
Clone https://github.com/farzaa/clicky.git into my current directory.

Then read the CLAUDE.md. I want to get Clicky running locally on my Mac.
Then read CLAUDE.md. I want to get Clicky running locally on my Mac.

Help me set up everything the Cloudflare Worker with my own API keys, the proxy URLs, and getting it building in Xcode. Walk me through it.
Help me set up everything: the Cloudflare Worker with my own API keys, the proxy URLs, and getting it building in Xcode. Walk me through it.
```

That's it. It'll clone the repo, read the docs, and walk you through the whole setup. Once you're running you can just keep talking to it build features, fix bugs, whatever. Go crazy.
That's it. It'll clone the repo, read the docs, and walk you through the whole setup. Once you're running you can just keep talking to it: build features, fix bugs, whatever. Go crazy.

## Manual setup

Expand All @@ -47,7 +61,9 @@ If you want to do it yourself, here's the deal.
- Xcode 15+
- Node.js 18+ (for the Cloudflare Worker)
- A [Cloudflare](https://cloudflare.com) account (free tier works)
- API keys for: [Anthropic](https://console.anthropic.com), [AssemblyAI](https://www.assemblyai.com), [ElevenLabs](https://elevenlabs.io)
- [Codex CLI](https://github.com/openai/codex) installed and logged in with `codex login` if you want to use Codex
- An [Anthropic](https://console.anthropic.com) API key if you want to use Claude
- API keys for: [AssemblyAI](https://www.assemblyai.com) and [ElevenLabs](https://elevenlabs.io)

### 1. Set up the Cloudflare Worker

Expand All @@ -66,6 +82,8 @@ npx wrangler secret put ASSEMBLYAI_API_KEY
npx wrangler secret put ELEVENLABS_API_KEY
```

You only need `ANTHROPIC_API_KEY` if you plan to use Claude. Codex does not use a Worker secret; the macOS app runs the local Codex CLI with the user's existing `codex login` session.

For the ElevenLabs voice ID, open `wrangler.toml` and set it there (it's not sensitive):

```toml
Expand Down Expand Up @@ -113,6 +131,8 @@ You'll find it in:
- `CompanionManager.swift` — Claude chat + ElevenLabs TTS
- `AssemblyAIStreamingTranscriptionProvider.swift` — AssemblyAI token endpoint

Codex chat is local to the Mac. Make sure `codex` is installed and that `codex login` has completed before selecting Codex in the Clicky panel.

### 4. Open in Xcode and run

```bash
Expand All @@ -135,28 +155,30 @@ The app will appear in your menu bar (not the dock). Click the icon to open the

## Architecture

If you want the full technical breakdown, read `CLAUDE.md`. But here's the short version:
If you want the full technical breakdown, read `AGENTS.md` or `CLAUDE.md`. But here's the short version:

**Menu bar app** (no dock icon) with two `NSPanel` windows — one for the control panel dropdown, one for the full-screen transparent cursor overlay. Push-to-talk streams audio over a websocket to AssemblyAI, sends the transcript + screenshot to Claude via streaming SSE, and plays the response through ElevenLabs TTS. Claude can embed `[POINT:x,y:label:screenN]` tags in its responses to make the cursor fly to specific UI elements across multiple monitors. All three APIs are proxied through a Cloudflare Worker.
**Menu bar app** (no dock icon) with two `NSPanel` windows — one for the control panel dropdown, one for the full-screen transparent cursor overlay. Push-to-talk streams audio over a websocket to AssemblyAI, sends the transcript + screenshot to the selected AI provider, and plays the response through ElevenLabs TTS. The panel lets users choose Codex or Claude, plus a provider-specific model. Codex runs locally through `codex exec`; Claude, ElevenLabs, and AssemblyAI go through the Cloudflare Worker. The selected model can embed `[POINT:x,y:label:screenN]` tags in its responses to make the cursor fly to specific UI elements across multiple monitors.

## Project structure

```
leanring-buddy/ # Swift source (yes, the typo stays)
CompanionManager.swift # Central state machine
CompanionPanelView.swift # Menu bar panel UI
CodexCLIAPI.swift # Local Codex CLI client
ClaudeAPI.swift # Claude streaming client
ElevenLabsTTSClient.swift # Text-to-speech playback
OverlayWindow.swift # Blue cursor overlay
AssemblyAI*.swift # Real-time transcription
BuddyDictation*.swift # Push-to-talk pipeline
worker/ # Cloudflare Worker proxy
src/index.ts # Three routes: /chat, /tts, /transcribe-token
CLAUDE.md # Full architecture doc (agents read this)
src/index.ts # /chat/claude, /tts, /transcribe-token
AGENTS.md # Full architecture doc for Codex and other agents
CLAUDE.md # Symlink to AGENTS.md for Claude Code
```

## Contributing

PRs welcome. If you're using Claude Code, it already knows the codebase — just tell it what you want to build and point it at `CLAUDE.md`.
PRs welcome. If you're using Codex, point it at `AGENTS.md`. If you're using Claude Code, point it at `CLAUDE.md`; both files describe the same architecture and conventions.

Got feedback? DM me on X [@farzatv](https://x.com/farzatv).
39 changes: 16 additions & 23 deletions leanring-buddy/AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,21 @@
# AGENTS.md - leanring-buddy (Main App Target)
# AGENTS.md - leanring-buddy

## Source Files
This directory contains the native macOS app target. Start with the root `AGENTS.md` for the full architecture, build constraints, and coding conventions; this file only adds target-specific guidance for edits under `leanring-buddy/`.

### FloatingSessionButton.swift
- `FloatingSessionButtonManager` — `@MainActor` class managing the `NSPanel` lifecycle
- `showFloatingButton()` — Creates/shows the panel in top-right of primary screen
- `hideFloatingButton()` — Hides panel (keeps it alive for quick re-show)
- `destroyFloatingButton()` — Removes panel permanently (session ended)
- `onFloatingButtonClicked` — Callback closure, set by ContentView to bring main window to front
- `floatingButtonPanel` — Exposed `NSPanel` reference for screenshot exclusion
- `FloatingButtonView` — Private SwiftUI view with gradient circle, scale+glow hover animation, pointer cursor
## Target Shape

### ContentView.swift
- Receives `FloatingSessionButtonManager` via `@EnvironmentObject`
- `isMainWindowCurrentlyFocused` — Tracks main window focus state
- `configureFloatingButtonManager()` — Wires up the click callback
- `startObservingMainWindowFocusChanges()` — Sets up `NSWindow` notification observers
- `updateFloatingButtonVisibility()` — Core logic: show if running + not focused, hide otherwise
- `bringMainWindowToFront()` — Activates app and orders main window front
- `leanring_buddyApp.swift` is the menu-bar app entry point and wires `CompanionAppDelegate`, `MenuBarPanelManager`, and `CompanionManager` together.
- `CompanionManager.swift` owns the core interaction state machine: push-to-talk, screenshot capture, AI provider/model selection, Codex/Claude streaming, TTS playback, cursor visibility, and pointing coordination.
- `CompanionPanelView.swift`, `CompanionResponseOverlay.swift`, `OverlayWindow.swift`, and `DesignSystem.swift` own the visible SwiftUI/AppKit UI surfaces.
- `BuddyDictationManager.swift` plus the `*TranscriptionProvider.swift` files own microphone capture and transcription-provider behavior.
- `CodexCLIAPI.swift` runs the local Codex CLI for Codex chat. `ClaudeAPI.swift`, `ElevenLabsTTSClient.swift`, and `AssemblyAIStreamingTranscriptionProvider.swift` talk to the Worker proxy for Claude chat, TTS, and AssemblyAI tokens.
- `AppBundleConfiguration.swift` is the runtime reader for app-bundle configuration values stored in `Info.plist`.

### ScreenshotManager.swift
- `floatingButtonWindowToExcludeFromCaptures` — `NSWindow?` reference set by ContentView
- `captureScreen()` — Matches the floating window to an `SCWindow` and excludes it from capture filter
## Editing Rules

### leanring_buddyApp.swift
- Owns `FloatingSessionButtonManager` as `@StateObject`
- Injects it into ContentView via `.environmentObject()`
- Keep changes local to the file that owns the behavior. Do not route new app state through `CompanionManager` unless it needs to coordinate the main interaction pipeline.
- Preserve the menu-bar-only app model. Do not introduce a dock window, document scene, or ordinary app lifecycle unless the root architecture changes first.
- Keep all UI mutations on the main actor. Prefer explicit `@MainActor` isolation over detached main-thread hops.
- Use the existing `DS` design tokens for panel and overlay UI. Do not add one-off colors, spacing scales, or button styles.
- Do not put API keys, bearer tokens, or provider secrets in Swift source, `Info.plist`, or project build settings. Claude, ElevenLabs, and AssemblyAI secrets belong in the Worker environment. Codex uses the user's local `codex login` session.
- Do not run `xcodebuild` from the terminal. Open the Xcode project and build there so macOS permissions do not get reset.
2 changes: 0 additions & 2 deletions leanring-buddy/BuddyDictationManager.swift
Original file line number Diff line number Diff line change
Expand Up @@ -654,8 +654,6 @@ final class BuddyDictationManager: NSObject, ObservableObject {
"makesomething",
"Learning Buddy",
"Codex",
"Claude",
"Anthropic",
"OpenAI",
"SwiftUI",
"Xcode",
Expand Down
Loading