[APP] AI pipeline improvements

<html><head></head><body><h1>AI pipeline improvement plan</h1>
<h2>Current state</h2>
<ul>
<li><code>AiService</code> calls Gemini directly from the client — API key exposed in the bundle</li>
<li>Single model, no fallback</li>
<li><code>generateObject</code> blocks until full response — no streaming</li>
<li>No prompt logging, no token/cost tracking</li>
<li>No memory or thread history</li>
</ul>
<hr>
<h2>Improved pipeline</h2>
<pre><code>┌─────────────────────────────────────────────────────────┐
│              React Native app (Expo)                    │
│     IAiService interface · Convex repos · tsyringe      │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│         ① Convex as API gateway (security)              │
│   Auth check · rate limiting · no secrets in bundle     │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│            ② Langfuse (observability)                   │
│      Prompt logs · token counts · latency · cost        │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│           ③ AiService — model fallback                  │
│     Primary: Gemini 2.5 Flash                           │
│     Fallback: Claude Haiku / GPT-4o Mini                │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│              ④ Convex backend (streaming)               │
│  ┌───────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│  │    Actions    │ │    Streaming    │ │  ⑤ Agent   │ │
│  │ Gemini·Places │ │ streamObject +  │ │  (future)   │ │
│  │ RapidAPI·     │ │ persistent-text │ │  threads ·  │ │
│  │ Unsplash      │ │ -streaming      │ │  memory ·   │ │
│  │ (all proxied) │ │ websocket→app   │ │  tools      │ │
│  └───────────────┘ └─────────────────┘ └─────────────┘ │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌──────────┐  ┌──────────┐  ┌───────────┐  ┌──────────┐  ┌──────────┐
│  Gemini  │  │  Claude  │  │  Places   │  │ RapidAPI │  │ Unsplash │
│ 2.5 Flash│  │ fallback │  │    API    │  │          │  │          │
└──────────┘  └──────────┘  └───────────┘  └──────────┘  └──────────┘
</code></pre>
<hr>
<h2>Improvements</h2>
<h3>① Move API keys server-side</h3>
<p>All secrets (Gemini, Places, RapidAPI, Unsplash) must be removed from the bundle and proxied through Convex actions. Every action verifies auth and applies per-user rate limiting via <code>@convex-dev/ratelimiter</code>. The <code>IAiService</code> interface is unchanged.</p>
<h3>② Add Langfuse observability</h3>
<p>Integrate Langfuse via its native <code>@langfuse/vercel-ai-sdk</code> wrapper — wraps the existing <code>generateObject</code> call in <code>AiService</code>, nothing else changes. Open-source (MIT), 50k observations/month free, full feature access on free tier including span-based tracing, prompt management, and cost tracking. Self-hostable at zero cost if needed.</p>
<h3>③ Add model fallback</h3>
<p>Upgrade <code>AiService</code> to try Gemini 2.5 Flash first, fall back to Claude Haiku or GPT-4o Mini on failure. Vercel AI SDK provider abstraction makes this a few lines in <code>data/services/AiService.ts</code>.</p>
<h3>④ Add streaming</h3>
<p>Replace <code>generateObject</code> with <code>streamObject</code> in the Convex action. Use <code>@convex-dev/persistent-text-streaming</code> to push delta chunks over websocket to the app. The existing reactive <code>useItemRepository</code> subscription handles the rest — the UI renders progressively instead of blocking for 3–5s.</p>
<h3>⑤ Convex Agent (future)</h3>
<p>Replace the raw action pattern with <code>@convex-dev/agent</code> to get persistent thread history, cross-session memory, and tool calling. Only the <code>AiService</code> implementation changes — the DI boundary keeps everything else untouched.</p>
<hr>
<h2>Priority order</h2>

# | What | Effort
-- | -- | --
① | API keys server-side + auth + rate limiting | 30 min
② | Langfuse observability | 30 min
③ | Model fallback | 1 hour
④ | Streaming via Convex | 1–2 days
⑤ | Convex Agent (memory, threads, tools) | 3–5 days

</body></html>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[APP] AI pipeline improvements #348

AI pipeline improvement plan

Current state

Improved pipeline

Improvements

① Move API keys server-side

② Add Langfuse observability

③ Add model fallback

④ Add streaming

⑤ Convex Agent (future)

Priority order

| What | Effort

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[APP] AI pipeline improvements #348

Description

AI pipeline improvement plan

Current state

Improved pipeline

Improvements

① Move API keys server-side

② Add Langfuse observability

③ Add model fallback

④ Add streaming

⑤ Convex Agent (future)

Priority order

| What | Effort

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions