Real-time "translation" for your cats โ shown right on your Meta Ray-Ban Display glasses.
Teach the app a few examples of what your cat's behaviors mean ("by the food bowl = wants snacks"), and whenever your kitty looks that way again, a caption pops up on the glasses HUD in real time โ so friends and cat sitters always know how they feel and what they want.
Built on the Meta Wearables Device Access Toolkit (DAT 0.7) for iOS.
Coming soon โ a glasses-POV clip of captions popping up over Chocobo & Rufus.
Sample caption cards (styled like the HUD, transparent PNGs for compositing over footage) live in overlays/.
- Live mood captions on the lens. The glasses camera watches your cat; matched captions render on the in-lens display.
- Teach by example. Snap a frame, label what it means. Add a few angles per caption for reliability.
- Fully on-device. No cloud, no per-frame API calls, no training step.
- Per-pet captions. e.g.
Chocobo: I want cuddles โค๏ธ,Rufus: Snacks! ๐.
Captions uses few-shot, example-based image matching with Apple's Vision framework, entirely on-device:
- Each saved reference photo is run through
VNGenerateImageFeaturePrintRequest, producing a feature print โ a compact perceptual embedding of what the image looks like. - Every sampled video frame from the glasses gets its own feature print.
- A nearest-neighbor search (
VNFeaturePrintObservation.computeDistance) finds the closest saved example. If that distance is under a tunable threshold, its label becomes the caption. - To avoid flicker, a caption only appears after it's the top match for several consecutive frames, then lingers briefly before clearing.
No training, no cloud โ just embeddings + a distance metric, computed locally and streamed to the heads-up display.
A couple of things that were non-obvious to get working with the DAT SDK:
- Camera + HUD share one session. A device is 1:1 with a
DeviceSession; the cameraStreamand theDisplaycapability are both added to that same session so live video and HUD captions run concurrently. DAMEnabledis required. The Display capability silently never reaches.startedunlessMWDAT.DAMEnabled = trueis set inInfo.plist. (This one cost a few hours.)- Matching runs ~3ร/sec on a background queue; captions are pushed to the HUD only when they actually change.
- Meta Ray-Ban Display glasses (display-capable) paired in the Meta AI app
- iOS 17+, Xcode 26+
- A device โ the DAT SDK has no simulator support
- Developer Mode enabled in the Meta AI app
- Open
CameraAccess.xcodeprojin Xcode. - Select the CameraAccess target โ Signing & Capabilities โ set your Team (automatic signing). The bundle ID is
com.fasai.captionsโ change it if needed. - In the Meta AI app, turn on Developer Mode (so no Wearables Developer Center registration is needed โ
Info.plistusesMetaAppID = 0). - Make sure
Info.plist โ MWDAT โ DAMEnabledistrue(it is, by default here). - Build & run on your iPhone, tap Connect, then Start Session.
- Start Session โ begins watching through the glasses.
- ๏ผ Teach โ freezes the current frame, then you name the mood (reuse-existing chips keep multi-angle rules tidy).
- โฅ Captions โ manage saved rules (rename / delete / add angles).
- โ๏ธ Settings โ match sensitivity (default
0.75), caption linger, and a Session Debugger (live feed + distance + matched-frame count) for tuning.
CameraAccess/
โโโ ViewModels/
โ โโโ CaptionEngine.swift # Vision feature-print matching, debounce, persistence
โ โโโ CaptionDisplayPresenter.swift # renders captions to the glasses HUD
โ โโโ StreamSessionViewModel.swift # camera stream + session lifecycle
โ โโโ DeviceSessionManager.swift
โโโ Views/
โโโ Theme.swift # pastel "kitty diary" design system
โโโ NonStreamView.swift # home
โโโ StreamView.swift # active session (caption-as-hero)
โโโ SettingsView.swift # settings
โโโ SessionDebuggerView.swift # live feed + calibration
โโโ ReferencesView.swift # manage caption rules
โโโ CaptureCaptionSheet.swift # capture-then-label flow
overlays/ # transparent caption PNGs for demos
overlays/ contains transparent PNG caption cards (styled like the HUD) for compositing over glasses footage, plus preview.html to view them.
Built on top of Meta's Wearables Device Access Toolkit sample code. Use of the toolkit is governed by the Meta Wearables Developer Terms โ see LICENSE. This project is a personal experiment and is not affiliated with or endorsed by Meta.
Made with ๐ฉท for Chocobo & Rufus.
