AI pipeline improvement plan
Current state
AiService calls Gemini directly from the client — API key exposed in the bundle
- Single model, no fallback
generateObject blocks until full response — no streaming
- No prompt logging, no token/cost tracking
- No memory or thread history
Improved pipeline
┌─────────────────────────────────────────────────────────┐
│ React Native app (Expo) │
│ IAiService interface · Convex repos · tsyringe │
└────────────────────────┬────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ ① Convex as API gateway (security) │
│ Auth check · rate limiting · no secrets in bundle │
└────────────────────────┬────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ ② Langfuse (observability) │
│ Prompt logs · token counts · latency · cost │
└────────────────────────┬────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ ③ AiService — model fallback │
│ Primary: Gemini 2.5 Flash │
│ Fallback: Claude Haiku / GPT-4o Mini │
└────────────────────────┬────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ ④ Convex backend (streaming) │
│ ┌───────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ Actions │ │ Streaming │ │ ⑤ Agent │ │
│ │ Gemini·Places │ │ streamObject + │ │ (future) │ │
│ │ RapidAPI· │ │ persistent-text │ │ threads · │ │
│ │ Unsplash │ │ -streaming │ │ memory · │ │
│ │ (all proxied) │ │ websocket→app │ │ tools │ │
│ └───────────────┘ └─────────────────┘ └─────────────┘ │
└────────────────────────┬────────────────────────────────┘
│
▼
┌──────────┐ ┌──────────┐ ┌───────────┐ ┌──────────┐ ┌──────────┐
│ Gemini │ │ Claude │ │ Places │ │ RapidAPI │ │ Unsplash │
│ 2.5 Flash│ │ fallback │ │ API │ │ │ │ │
└──────────┘ └──────────┘ └───────────┘ └──────────┘ └──────────┘
Improvements
① Move API keys server-side
All secrets (Gemini, Places, RapidAPI, Unsplash) must be removed from the bundle and proxied through Convex actions. Every action verifies auth and applies per-user rate limiting via @convex-dev/ratelimiter. The IAiService interface is unchanged.
② Add Langfuse observability
Integrate Langfuse via its native @langfuse/vercel-ai-sdk wrapper — wraps the existing generateObject call in AiService, nothing else changes. Open-source (MIT), 50k observations/month free, full feature access on free tier including span-based tracing, prompt management, and cost tracking. Self-hostable at zero cost if needed.
③ Add model fallback
Upgrade AiService to try Gemini 2.5 Flash first, fall back to Claude Haiku or GPT-4o Mini on failure. Vercel AI SDK provider abstraction makes this a few lines in data/services/AiService.ts.
④ Add streaming
Replace generateObject with streamObject in the Convex action. Use @convex-dev/persistent-text-streaming to push delta chunks over websocket to the app. The existing reactive useItemRepository subscription handles the rest — the UI renders progressively instead of blocking for 3–5s.
⑤ Convex Agent (future)
Replace the raw action pattern with @convex-dev/agent to get persistent thread history, cross-session memory, and tool calling. Only the AiService implementation changes — the DI boundary keeps everything else untouched.
Priority order
| What | Effort
-- | -- | --
① | API keys server-side + auth + rate limiting | 30 min
② | Langfuse observability | 30 min
③ | Model fallback | 1 hour
④ | Streaming via Convex | 1–2 days
⑤ | Convex Agent (memory, threads, tools) | 3–5 days
AI pipeline improvement plan
Current state
AiServicecalls Gemini directly from the client — API key exposed in the bundlegenerateObjectblocks until full response — no streamingImproved pipeline
Improvements
① Move API keys server-side
All secrets (Gemini, Places, RapidAPI, Unsplash) must be removed from the bundle and proxied through Convex actions. Every action verifies auth and applies per-user rate limiting via
@convex-dev/ratelimiter. TheIAiServiceinterface is unchanged.② Add Langfuse observability
Integrate Langfuse via its native
@langfuse/vercel-ai-sdkwrapper — wraps the existinggenerateObjectcall inAiService, nothing else changes. Open-source (MIT), 50k observations/month free, full feature access on free tier including span-based tracing, prompt management, and cost tracking. Self-hostable at zero cost if needed.③ Add model fallback
Upgrade
AiServiceto try Gemini 2.5 Flash first, fall back to Claude Haiku or GPT-4o Mini on failure. Vercel AI SDK provider abstraction makes this a few lines indata/services/AiService.ts.④ Add streaming
Replace
generateObjectwithstreamObjectin the Convex action. Use@convex-dev/persistent-text-streamingto push delta chunks over websocket to the app. The existing reactiveuseItemRepositorysubscription handles the rest — the UI renders progressively instead of blocking for 3–5s.⑤ Convex Agent (future)
Replace the raw action pattern with
@convex-dev/agentto get persistent thread history, cross-session memory, and tool calling. Only theAiServiceimplementation changes — the DI boundary keeps everything else untouched.Priority order
| What | Effort
-- | -- | --
① | API keys server-side + auth + rate limiting | 30 min
② | Langfuse observability | 30 min
③ | Model fallback | 1 hour
④ | Streaming via Convex | 1–2 days
⑤ | Convex Agent (memory, threads, tools) | 3–5 days