Skip to content

[APP] AI pipeline improvements #348

@timothyrusso

Description

@timothyrusso

AI pipeline improvement plan

Current state

  • AiService calls Gemini directly from the client — API key exposed in the bundle
  • Single model, no fallback
  • generateObject blocks until full response — no streaming
  • No prompt logging, no token/cost tracking
  • No memory or thread history

Improved pipeline

┌─────────────────────────────────────────────────────────┐
│              React Native app (Expo)                    │
│     IAiService interface · Convex repos · tsyringe      │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│         ① Convex as API gateway (security)              │
│   Auth check · rate limiting · no secrets in bundle     │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│            ② Langfuse (observability)                   │
│      Prompt logs · token counts · latency · cost        │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│           ③ AiService — model fallback                  │
│     Primary: Gemini 2.5 Flash                           │
│     Fallback: Claude Haiku / GPT-4o Mini                │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│              ④ Convex backend (streaming)               │
│  ┌───────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│  │    Actions    │ │    Streaming    │ │  ⑤ Agent   │ │
│  │ Gemini·Places │ │ streamObject +  │ │  (future)   │ │
│  │ RapidAPI·     │ │ persistent-text │ │  threads ·  │ │
│  │ Unsplash      │ │ -streaming      │ │  memory ·   │ │
│  │ (all proxied) │ │ websocket→app   │ │  tools      │ │
│  └───────────────┘ └─────────────────┘ └─────────────┘ │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌──────────┐  ┌──────────┐  ┌───────────┐  ┌──────────┐  ┌──────────┐
│  Gemini  │  │  Claude  │  │  Places   │  │ RapidAPI │  │ Unsplash │
│ 2.5 Flash│  │ fallback │  │    API    │  │          │  │          │
└──────────┘  └──────────┘  └───────────┘  └──────────┘  └──────────┘

Improvements

① Move API keys server-side

All secrets (Gemini, Places, RapidAPI, Unsplash) must be removed from the bundle and proxied through Convex actions. Every action verifies auth and applies per-user rate limiting via @convex-dev/ratelimiter. The IAiService interface is unchanged.

② Add Langfuse observability

Integrate Langfuse via its native @langfuse/vercel-ai-sdk wrapper — wraps the existing generateObject call in AiService, nothing else changes. Open-source (MIT), 50k observations/month free, full feature access on free tier including span-based tracing, prompt management, and cost tracking. Self-hostable at zero cost if needed.

③ Add model fallback

Upgrade AiService to try Gemini 2.5 Flash first, fall back to Claude Haiku or GPT-4o Mini on failure. Vercel AI SDK provider abstraction makes this a few lines in data/services/AiService.ts.

④ Add streaming

Replace generateObject with streamObject in the Convex action. Use @convex-dev/persistent-text-streaming to push delta chunks over websocket to the app. The existing reactive useItemRepository subscription handles the rest — the UI renders progressively instead of blocking for 3–5s.

⑤ Convex Agent (future)

Replace the raw action pattern with @convex-dev/agent to get persistent thread history, cross-session memory, and tool calling. Only the AiService implementation changes — the DI boundary keeps everything else untouched.


Priority order

| What | Effort

-- | -- | --
① | API keys server-side + auth + rate limiting | 30 min
② | Langfuse observability | 30 min
③ | Model fallback | 1 hour
④ | Streaming via Convex | 1–2 days
⑤ | Convex Agent (memory, threads, tools) | 3–5 days

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status
    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions