The screenshot is the most spontaneous, simple note-taker there is. When we are inspired or need to note something, a screenshot is often our go-to. After a while, screenshots become disorganized, and it is easy to lose the context of your thought process when you took them.
Cache AI is a Chrome extension and web application that captures, analyzes, and organizes your browsing context using AI-powered screenshot analysis and a knowledge-graph-style memory. Each time you take a screenshot, the image is added as a node and placed near other nodes based on semantic similarity. AI analyzes the image and produces a short label and a written analysis. At capture time you can add an optional audio note that is transcribed, attached to the screenshot, and stored with the node.
Cache AI is powered by the Gemini API (Google) for vision, embeddings, transcription, and chat, and by the Mem0 API for optional long-term memory in chat.
A future iOS app is planned for more realistic use cases and easy, on-the-fly screenshotting.
- Overview
- Architecture
- Features
- Prerequisites
- Setup
- Configuration
- Development
- Deployment
- API Reference
- Security
- Troubleshooting
Chrome extension (CacheExtension)
Runs on any tab. A floating overlay provides: capture (screenshot of the current tab), and an optional audio note. Screenshots and optional audio are sent to the Cache AI backend; the extension does not hold API keys. Authenticated users sync sessions with the web app so nodes appear there.
Web app (CacheWeb)
React SPA hosted at cacheai.app (or your own domain). Users sign in with Supabase Auth (e.g. Google). The main view is a canvas (Cache Plane) where each screenshot is a node. Nodes are laid out by semantic clustering (UMAP over Gemini embeddings) or randomly. You can open a node to see the image, label, analysis, intent, and any audio note; search nodes; and use an AI chat that can use Mem0-backed memory and attach nodes or files as context.
Backend
Serverless API on Vercel: screenshot analysis (Gemini vision + optional embeddings), audio transcription (Gemini), storage and retrieval of cache nodes (Supabase), chat (Gemini with optional Mem0 context), and Mem0 proxy for memory. Auth is Supabase; the extension uses the same session via a bridge when the user is on the web app.
- Extension: Manifest V3. Content script injects the overlay; background script handles capture, API calls, and sync. Communicates with the web app via
webapp-bridge.jswhen the user is on cacheai.app (or localhost). - Web app: React 19, TypeScript, React Flow for the graph, UMAP-js for 2D layout from embeddings. Supabase client for auth and (via API) for cache_nodes.
- API (Vercel): Node handlers under
CacheWeb/api/. Supabase service role for DB and auth checks; env vars for Gemini and Mem0. - Data: Supabase
cache_nodes(id, user_id, image_data base64, label, analysis, intent, audio_note, embedding array, timestamp). Optional Mem0 usage for chat memory (user_id = email).
- Screenshot capture: One-click capture of the active tab from the extension overlay.
- Optional audio note: Record a short note; it is transcribed (Gemini) and stored with the screenshot node.
- AI analysis: Each screenshot is analyzed by Gemini to produce a concise label, a longer analysis, and an inferred intent. Optional embedding is computed for semantic layout.
- Semantic clustering: In the web app, nodes with embeddings are laid out with UMAP so related content appears near each other. Fallback to random layout when embeddings are missing or few.
- Cache Plane: Graph view of all your nodes; click a node to see image, label, analysis, intent, and audio note.
- Search: Text search over node labels, analysis, and intent on the current canvas.
- Chat: AI chat (Gemini) with optional Mem0 memory. You can attach cache nodes or files so the model has context.
- Account: Sign in with Supabase (e.g. Google). Delete account option removes your Supabase-backed data and Mem0 memories.
- Node.js 16+
- Chrome (for the extension)
- Accounts and keys:
- Supabase: Project for auth and database (see
CacheWeb/database/schema.sql). - Google Cloud / AI Studio: Gemini API key (vision, embeddings, transcription, chat).
- Mem0 (optional): API key for chat memory. Without it, chat still works but has no long-term memory.
- Google OAuth (optional): For “Sign in with Google” on the web app; must be configured in Supabase and in the app’s env.
- Supabase: Project for auth and database (see
git clone <repo-url>
cd CacheAIExtension
cd CacheExtension
npm install
npm run buildWeb app
cd CacheWeb
npm install- Create a Supabase project.
- In the SQL Editor, run the contents of
CacheWeb/database/schema.sql. - In Dashboard > Authentication > Providers, enable Google (or others) if desired.
- Note: Project URL, anon key, and service_role key for the backend.
- Open
chrome://extensions, enable “Developer mode”. - “Load unpacked” and select the
CacheExtensionfolder (afternpm run build, so that the built JS exists as expected bymanifest.json). - The extension uses the backend at
REACT_APP_API_URL(see Configuration). For local development you can point the web app to local API; the extension’s API base is set in its background script (e.g.https://cacheai.app/apifor production).
Web app (e.g. .env or .env.local for local dev)
REACT_APP_API_URL: Backend base URL (e.g.https://cacheai.app/apiorhttp://localhost:3000/apiif you run API locally).REACT_APP_GOOGLE_CLIENT_ID: Optional; for Google OAuth.- Supabase URL and anon key are usually set in the app’s Supabase client config (e.g.
CacheWeb/src/config/supabase.ts).
Backend (Vercel or local serverless)
SUPABASE_URL,SUPABASE_SERVICE_KEY: Supabase project URL and service_role key.GEMINI_API_KEY: Google AI Studio / Gemini API key.MEM0_API_KEY: Optional; for Mem0 memory in chat.
See CacheWeb/env.production.example for a full list.
- Extension API base: In
CacheExtension/background.js,API_URLis set to the Cache AI backend (e.g.https://cacheai.app/api). Change it for a custom backend. - Web app API base: Set via
REACT_APP_API_URL. Must match the backend that serves the API routes and has the env vars above. - Auth sync: When the user is logged in on the web app (cacheai.app or localhost), the app posts a message to the extension with the Supabase access token so the extension can call the same API as the logged-in user.
Web app
cd CacheWeb
npm startRuns the React app (e.g. http://localhost:3000). For local API, you need to run the Vercel dev server or equivalent so that CacheWeb/api/* handlers and env vars are available.
Extension
After changing TypeScript/React in CacheExtension, run npm run build and reload the extension in chrome://extensions.
API locally
Use Vercel CLI from the project root (or from CacheWeb) so that api/ is served and env is loaded:
vercel devPoint the web app’s REACT_APP_API_URL to the URL Vercel dev prints (e.g. http://localhost:3000).
- Web app + API: Typically deployed together on Vercel. Build the React app; configure routes so that
/*serves the SPA and/api/*goes to the serverless functions. Set all environment variables in Vercel. - Supabase: Already hosted; ensure RLS and schema are applied as in
schema.sql. - Extension: No separate “deploy”; users load the unpacked extension or you distribute via the Chrome Web Store. Ensure the extension’s
API_URLpoints to your deployed API.
See the Setup and Configuration sections above and CacheWeb/env.production.example for Supabase and Vercel setup.
All API routes live under CacheWeb/api/. Authenticated endpoints expect Authorization: Bearer <supabase_access_token>.
| Method | Path | Description |
|---|---|---|
| GET | /api/health |
Health check; no auth. |
| POST | /api/analyze-screenshot |
Body: { imageData (base64), audioNote? }. Returns { success, analysis: { label, analysis, intent, audioNote?, embedding? } }. Uses Gemini for vision and optional embedding. |
| POST | /api/transcribe-audio |
Body: { audioData, mimeType? }. Returns { success, transcription }. Uses Gemini. |
| GET | /api/cache-nodes |
Returns { success, nodes } for the authenticated user. |
| POST | /api/cache-nodes |
Body: { id, imageData?, label, analysis, intent, audioNote?, embedding?, timestamp? }. Creates a cache node. |
| DELETE | /api/cache-nodes?id=<id> |
Deletes the given cache node for the user. |
| POST | /api/chat |
Body: { messages, useMemory? }. Returns stream or JSON with Gemini reply; optionally uses Mem0 for context. |
| GET/POST/DELETE | /api/memories |
Proxy to Mem0 for listing, adding, and deleting memories (optional). |
| GET/POST | /api/user/account |
User account info and delete-account (delete also clears Mem0 for that user). |
Before pushing this repo to a public host (e.g. GitHub):
- Do not commit any
.env,.env.local,.env.production, or other env files. They are listed in.gitignore; ensure they were never added withgit add -f. - Do not put API keys, secrets, or tokens in source code. Use environment variables only (e.g. Vercel dashboard for the backend; local
.envfor development, which must stay untracked). - Backend secrets (e.g.
GEMINI_API_KEY,MEM0_API_KEY,SUPABASE_SERVICE_KEY) must exist only on the server (Vercel env). The extension and web app call your backend; they do not need those keys. - Frontend env (e.g.
REACT_APP_API_URL,REACT_APP_SUPABASE_URL) is baked into the client build. Do not put secret keys inREACT_APP_*; use them only for non-secret config (API base URL, Supabase anon key is designed to be public). - If you ever committed a secret, rotate it immediately (new key in provider, update env), and remove the secret from history (e.g.
git filter-branchor BFG) or make the repo private.
- Screenshot not appearing in web app: Ensure you are signed in on the web app and that the extension has received auth sync (visit the web app with the extension enabled). Check that the backend has valid Supabase and Gemini env vars and that the extension’s API_URL is correct.
- Analysis or transcription fails: Verify
GEMINI_API_KEYis set and has access to the models used (e.g. gemini-2.5-flash-lite, embedding-001). Check payload size (e.g. image base64 under 10MB). - Chat has no memory: Mem0 is optional. Set
MEM0_API_KEYfor memory; ensure the chat API is calling Mem0 and that the user_id (e.g. email) is consistent. - Nodes not clustering semantically: Semantic layout requires at least two nodes with embeddings. If the analyze-screenshot step fails to return embeddings or the node is created without them, that node will be placed randomly.
- Extension cannot reach API: Check host_permissions in
manifest.jsonand thatAPI_URLin the background script points to a reachable backend (and CORS allows the extension origin if required).