Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion Procfile

This file was deleted.

50 changes: 29 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,14 @@ This project analyzes LinkedIn posts to classify them as generating positive or

## 🏗️ Architecture

> **Note:** the sections below describe the original research/training workflow
> in `attempt2.ipynb` (Gemini embeddings + XGBoost). The **deployed runtime
> backend** is now `services/ml_api` — a provider-free TF-IDF recruiting-signal
> engine that needs no AI provider key. See `docs/API_README.md` for its
> endpoints (`/health`, `/analyze`, `/analyze/compare`, `/history`) and
> `RENDER_DEPLOY.md` for deployment. The Next.js app reaches it via the
> same-origin proxy `app/api/analyze` (`ML_API_URL`).

```
LinkedIn Posts → Label Generation → Feature Engineering → Model Training → Prediction
↓ (VADER + Engagement) ↓ ↓
Expand Down Expand Up @@ -214,35 +222,39 @@ Supabase schema for logging requests/responses: `docs/supabase.sql` (table `anal

## AI provider abstraction (Vercel AI SDK)

The Next.js AI layer uses the **Vercel AI SDK** (`ai` + `@ai-sdk/google` + `zod`),
so the model provider is swappable. `lib/ai/provider.ts` exposes `getModel()`,
which resolves a model from the `AI_MODEL` env var (default
`google/gemini-2.0-flash`).
The Next.js AI layer uses the **Vercel AI SDK** (`ai` + `@ai-sdk/openai` +
`@ai-sdk/google` + `zod`), so the model provider is swappable. `lib/ai/provider.ts`
exposes `getModel()`, which resolves a model from the `AI_MODEL` env var
(default `openai/gpt-4o-mini`). The persona-critique / variant-eval client at
`lib/google-ai/client.ts` (legacy dir name, now provider-agnostic) uses this
resolver with `generateObject` + zod.

How a model is resolved:
- If `AI_GATEWAY_API_KEY` is set (or, on Vercel, OIDC enables the Gateway), the
`provider/model` string is routed through the **Vercel AI Gateway**, which adds
failover and cost tracking.
- Otherwise it falls back to the `@ai-sdk/google` provider using a direct key.
The key is read from `GOOGLE_GENERATIVE_AI_API_KEY` (preferred), with
`GEMINI_API_KEY` as a fallback.
- Otherwise it falls back to a direct provider key. For the default
`openai/...` model the key is read from `OPENAI_API_KEY` (via `@ai-sdk/openai`).
A `google/...` model still works via `@ai-sdk/google`, reading
`GOOGLE_GENERATIVE_AI_API_KEY` (preferred), with `GEMINI_API_KEY` as a fallback.

**Switching providers** is a one-line change: set `AI_MODEL` (e.g.
`openai/gpt-4o-mini`) and supply the relevant provider key or use the Gateway.
**Switching providers/models** is a one-line change: set `AI_MODEL` and supply the
matching provider key (or use the Gateway).

```
AI_MODEL=google/gemini-2.0-flash # default
AI_GATEWAY_API_KEY=... # optional: route via Vercel AI Gateway
GOOGLE_GENERATIVE_AI_API_KEY=... # direct Google key (GEMINI_API_KEY is a fallback)
AI_MODEL=openai/gpt-4o-mini # default
OPENAI_API_KEY=... # direct OpenAI key (used by the default model)
AI_GATEWAY_API_KEY=... # optional: route any provider/model via Vercel AI Gateway
GOOGLE_GENERATIVE_AI_API_KEY=... # only if you switch AI_MODEL to google/... (GEMINI_API_KEY is a fallback)
```

## Rate limiting

Two independent limiters protect the public surface:

**Next.js inbound limiter** (`lib/ratelimit.ts`) — an in-memory, per-client-IP
limiter applied to all public POST routes: `/api/gemini`, `/api/predict`,
`/api/analyze`, `/api/analyze-with-images`, `/api/ab-tests`, `/api/personas`,
limiter applied to the public POST routes, including `/api/analyze`,
`/api/gemini`, `/api/analyze-with-images`, `/api/ab-tests`, `/api/personas`, and
`/api/drafts`. Over-limit requests get a `429` with a `Retry-After` header.
Configure with:
```
Expand All @@ -256,10 +268,6 @@ There is also a separate **outbound** throttle on calls to the AI provider,
configured with `GEMINI_RATE_LIMIT_MAX_REQUESTS` (default `15`) and
`GEMINI_RATE_LIMIT_WINDOW_MS` (default `60000`).

**FastAPI `/predict` limiter** (`api.py`) — a `slowapi` per-IP limit on
`POST /predict`, returning `429` when exceeded. Configure with
`RATE_LIMIT_PREDICT` (default `30/minute`).
> Caveat: the default store is in-memory **per gunicorn worker**, so with N
> workers the effective global limit is ~N× the configured value. Set
> `RATELIMIT_STORAGE_URI` (e.g. `redis://host:6379/0`) for a consistent global
> limit. See `RENDER_DEPLOY.md` for deployment details.
The `services/ml_api` backend has no built-in limiter of its own; it is reached
only through the Next.js proxy (`app/api/analyze`), so the inbound limiter above
covers it. See `RENDER_DEPLOY.md` for deployment details.
88 changes: 40 additions & 48 deletions RENDER_DEPLOY.md
Original file line number Diff line number Diff line change
@@ -1,55 +1,58 @@
# Deploying the PR Sentiment API to Render
# Deploying the Lyra ML API to Render

This is the FastAPI backend (`api.py` + `prediction_service.py`) that serves the
trained LinkedIn PR sentiment classifier. The trained artifacts live in
`output/` and are committed to the repo, so no external storage is needed.
This is the FastAPI backend (`services/ml_api`) — a **provider-free TF-IDF
recruiting-signal engine**. It needs no AI provider key and no external database:
the trained model artifacts live in `output/models/` (committed to the repo) and
request/response logging uses local sqlite.

## What ships

| File | Purpose |
|------|---------|
| `render.yaml` | Render Blueprint — defines the web service, build/start commands, health check, env vars. |
| `Procfile` | Same start command, for non-Blueprint / generic buildpack deploys. |
| `runtime.txt` | Pins Python 3.12.3. |
| `requirements_api.txt` | Python deps (scikit-learn pinned to **1.6.1** to match the pickled model). |
| `output/*.pkl`, `output/*.npy` | Trained model + scaler + encoders. |
| `services/ml_api/requirements.txt` | Python deps for the API. |
| `output/models/*` | Trained TF-IDF models + metadata (`metadata.json`, `*.joblib`, `train_tfidf_matrix.npz`, …). |

The Blueprint provisions a single web service named **`lyra-ml-api`** on the
Render **free** plan, running:

```
gunicorn services.ml_api.main:app -k uvicorn.workers.UvicornWorker
```

(bound to `$PORT`, `${WEB_CONCURRENCY:-1}` workers, `/health` health check).

## One-time setup

1. Push this branch to GitHub.
1. Push this repo to GitHub.
2. In Render: **New +** → **Blueprint** → select the repo. Render reads `render.yaml`.
3. Set the **`GEMINI_API_KEY`** secret in the dashboard (it's `sync: false`, so it is
never stored in the repo). Get a key at https://aistudio.google.com/app/apikey.
4. (Recommended) Set **`ALLOWED_ORIGINS`** to your frontend origin(s),
comma-separated, instead of `*`.
5. Deploy. Render runs the health check against `/health`; the service only
reports healthy once the model has loaded.
3. (Recommended) Set **`FRONTEND_ORIGIN`** to your deployed frontend origin so the
browser can reach the API directly if needed (the app normally calls it
server-side via `app/api/analyze`, so CORS rarely matters in production).
4. Deploy. Render runs the health check against `/health`; the service only
reports healthy once the models have loaded.

> No secrets are required — there is no AI provider key for this service.

## Environment variables

| Var | Required | Default | Notes |
|-----|----------|---------|-------|
| `GEMINI_API_KEY` | ✅ | — | App refuses to start without it (fail-fast). |
| `MODEL_DIR` | | `output` | Directory holding the `.pkl`/`.npy` artifacts. |
| `ALLOWED_ORIGINS` | | `*` | Comma-separated origins. With `*`, credentials are disabled (CORS spec). |
| `WEB_CONCURRENCY` | | `1` | gunicorn workers. Each worker loads the model — raise only after checking memory. |
| `LOG_LEVEL` | | `INFO` | |
| `RATE_LIMIT_PREDICT` | | `30/minute` | Per-IP limit on `POST /predict` (slowapi/limits syntax, e.g. `100/hour`, `5/second`). Over-limit requests get a 429. |
| `RATELIMIT_STORAGE_URI` | | (in-memory) | Shared rate-limit store, e.g. `redis://host:6379/0`. Without it the store is per-process — see caveat below. |
| `MODEL_DIR` | | `output/models` | Directory holding the trained TF-IDF artifacts. |
| `FRONTEND_ORIGIN` | | `http://localhost:3000` | Frontend origin allowed by CORS. Mainly matters if the browser hits the API directly. |
| `WEB_CONCURRENCY` | | `1` | gunicorn workers. Each worker loads the models — raise only after checking memory. |
| `PORT` | (Render-injected) | `8000` | Bound automatically by the start command. |

> **Rate-limit caveat:** the default store is in-memory **per gunicorn worker**, so
> with `WEB_CONCURRENCY` = N the effective global limit is roughly N× the
> configured `RATE_LIMIT_PREDICT`. Set `RATELIMIT_STORAGE_URI` to a shared Redis
> instance for a single, consistent global limit across all workers and instances.
| `PYTHON_VERSION` | | `3.12.3` | Pins the Python runtime for the build. |

## Verify after deploy

```bash
curl https://<your-service>.onrender.com/health
curl -X POST https://<your-service>.onrender.com/predict \
# -> {"status":"ok","models_loaded":true}

curl -X POST https://<your-service>.onrender.com/analyze \
-H 'Content-Type: application/json' \
-d '{"text":"Excited to announce our new platform! #AI","has_media":1,"media_count":1}'
-d '{"post_text":"We are scaling our AI team fast. Expect late nights but huge impact."}'
```

Interactive docs: `https://<your-service>.onrender.com/docs`
Expand All @@ -58,19 +61,19 @@ Interactive docs: `https://<your-service>.onrender.com/docs`

```bash
python -m venv venv && source venv/bin/activate
pip install -r requirements_api.txt
export GEMINI_API_KEY=your-key
python api.py # dev server on :8000 (set PORT/RELOAD to override)
pip install -r services/ml_api/requirements.txt
# dev server on :8000 (reads PORT/MODEL_DIR from env)
python -m services.ml_api.main
# or, mirror production:
gunicorn api:app -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000
gunicorn services.ml_api.main:app -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000
```

## Wiring the Next.js frontend

The frontend never calls the FastAPI service directly. Instead:

- `app/sentiment-analyzer/page.tsx` POSTs to the same-origin route `/api/predict`.
- `app/api/predict/route.ts` forwards the request to `${ML_API_URL}/predict`.
- The app POSTs to the same-origin route `app/api/analyze`.
- `app/api/analyze` forwards the request to `${ML_API_URL}/analyze`.

So you only set **one** env var on the Next.js host (e.g. Vercel):

Expand All @@ -79,16 +82,5 @@ ML_API_URL=https://<your-service>.onrender.com
```

Locally, `ML_API_URL` defaults to `http://localhost:8000`. Run both together
with `npm run dev` (starts `next dev` + `uvicorn api:app` via `concurrently`),
which also needs `GEMINI_API_KEY` exported for the Python side.

## Model caveat (read before demoing)

The model currently in `output/` is the **full-embedding (768-dim) classifier**
— the PCA/regularization fixes described in `FIXES_APPLIED.md` were *documented
but never saved* (`pca_reducer.pkl` is absent, and the saved model reports 784
input features = 768 embeddings + 16 metadata). It therefore still carries the
documented overfitting (~84% train / ~45% test). The serving pipeline is
correct and dimensionally consistent; if you re-run the notebook to actually
apply PCA, save `pca_reducer.pkl` into `output/` and the service will pick it up
automatically (it already branches on the file's presence).
with `npm run dev` (starts `next dev` + `uvicorn services.ml_api.main:app` via
`concurrently`).
Loading
Loading