Skip to content

fasaiph/kitty-captions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿฑ Kitty Captions

Kitty Captions

Real-time "translation" for your cats โ€” shown right on your Meta Ray-Ban Display glasses.

Teach the app a few examples of what your cat's behaviors mean ("by the food bowl = wants snacks"), and whenever your kitty looks that way again, a caption pops up on the glasses HUD in real time โ€” so friends and cat sitters always know how they feel and what they want.

Built on the Meta Wearables Device Access Toolkit (DAT 0.7) for iOS.


๐ŸŽฅ Demo

Coming soon โ€” a glasses-POV clip of captions popping up over Chocobo & Rufus. Sample caption cards (styled like the HUD, transparent PNGs for compositing over footage) live in overlays/.


โœจ What it does

  • Live mood captions on the lens. The glasses camera watches your cat; matched captions render on the in-lens display.
  • Teach by example. Snap a frame, label what it means. Add a few angles per caption for reliability.
  • Fully on-device. No cloud, no per-frame API calls, no training step.
  • Per-pet captions. e.g. Chocobo: I want cuddles โค๏ธ, Rufus: Snacks! ๐ŸŸ.

๐Ÿง  How the matching works

Captions uses few-shot, example-based image matching with Apple's Vision framework, entirely on-device:

  1. Each saved reference photo is run through VNGenerateImageFeaturePrintRequest, producing a feature print โ€” a compact perceptual embedding of what the image looks like.
  2. Every sampled video frame from the glasses gets its own feature print.
  3. A nearest-neighbor search (VNFeaturePrintObservation.computeDistance) finds the closest saved example. If that distance is under a tunable threshold, its label becomes the caption.
  4. To avoid flicker, a caption only appears after it's the top match for several consecutive frames, then lingers briefly before clearing.

No training, no cloud โ€” just embeddings + a distance metric, computed locally and streamed to the heads-up display.

๐Ÿ—๏ธ Architecture notes

A couple of things that were non-obvious to get working with the DAT SDK:

  • Camera + HUD share one session. A device is 1:1 with a DeviceSession; the camera Stream and the Display capability are both added to that same session so live video and HUD captions run concurrently.
  • DAMEnabled is required. The Display capability silently never reaches .started unless MWDAT.DAMEnabled = true is set in Info.plist. (This one cost a few hours.)
  • Matching runs ~3ร—/sec on a background queue; captions are pushed to the HUD only when they actually change.

๐Ÿ“‹ Requirements

  • Meta Ray-Ban Display glasses (display-capable) paired in the Meta AI app
  • iOS 17+, Xcode 26+
  • A device โ€” the DAT SDK has no simulator support
  • Developer Mode enabled in the Meta AI app

๐Ÿš€ Setup

  1. Open CameraAccess.xcodeproj in Xcode.
  2. Select the CameraAccess target โ†’ Signing & Capabilities โ†’ set your Team (automatic signing). The bundle ID is com.fasai.captions โ€” change it if needed.
  3. In the Meta AI app, turn on Developer Mode (so no Wearables Developer Center registration is needed โ€” Info.plist uses MetaAppID = 0).
  4. Make sure Info.plist โ†’ MWDAT โ†’ DAMEnabled is true (it is, by default here).
  5. Build & run on your iPhone, tap Connect, then Start Session.

๐ŸŽฎ Using it

  • Start Session โ€” begins watching through the glasses.
  • ๏ผ‹ Teach โ€” freezes the current frame, then you name the mood (reuse-existing chips keep multi-angle rules tidy).
  • โ™ฅ Captions โ€” manage saved rules (rename / delete / add angles).
  • โš™๏ธ Settings โ€” match sensitivity (default 0.75), caption linger, and a Session Debugger (live feed + distance + matched-frame count) for tuning.

๐Ÿ“ Project structure

CameraAccess/
โ”œโ”€โ”€ ViewModels/
โ”‚   โ”œโ”€โ”€ CaptionEngine.swift            # Vision feature-print matching, debounce, persistence
โ”‚   โ”œโ”€โ”€ CaptionDisplayPresenter.swift  # renders captions to the glasses HUD
โ”‚   โ”œโ”€โ”€ StreamSessionViewModel.swift   # camera stream + session lifecycle
โ”‚   โ””โ”€โ”€ DeviceSessionManager.swift
โ””โ”€โ”€ Views/
    โ”œโ”€โ”€ Theme.swift                    # pastel "kitty diary" design system
    โ”œโ”€โ”€ NonStreamView.swift            # home
    โ”œโ”€โ”€ StreamView.swift               # active session (caption-as-hero)
    โ”œโ”€โ”€ SettingsView.swift             # settings
    โ”œโ”€โ”€ SessionDebuggerView.swift      # live feed + calibration
    โ”œโ”€โ”€ ReferencesView.swift           # manage caption rules
    โ””โ”€โ”€ CaptureCaptionSheet.swift      # capture-then-label flow
overlays/                              # transparent caption PNGs for demos

๐ŸŽฌ Demo overlays

overlays/ contains transparent PNG caption cards (styled like the HUD) for compositing over glasses footage, plus preview.html to view them.

๐Ÿ“œ License & attribution

Built on top of Meta's Wearables Device Access Toolkit sample code. Use of the toolkit is governed by the Meta Wearables Developer Terms โ€” see LICENSE. This project is a personal experiment and is not affiliated with or endorsed by Meta.


Made with ๐Ÿฉท for Chocobo & Rufus.

About

๐Ÿฑ Real-time cat-mood captions on Meta Ray-Ban Display glasses (on-device Vision feature-print matching)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors