f1shyfang · f1shyfang · May 30, 2026 · May 30, 2026
diff --git a/Procfile b/Procfile
diff --git a/README.md b/README.md
@@ -17,6 +17,14 @@ This project analyzes LinkedIn posts to classify them as generating positive or
 
 ## 🏗️ Architecture
 
+> **Note:** the sections below describe the original research/training workflow
+> in `attempt2.ipynb` (Gemini embeddings + XGBoost). The **deployed runtime
+> backend** is now `services/ml_api` — a provider-free TF-IDF recruiting-signal
+> engine that needs no AI provider key. See `docs/API_README.md` for its
+> endpoints (`/health`, `/analyze`, `/analyze/compare`, `/history`) and
+> `RENDER_DEPLOY.md` for deployment. The Next.js app reaches it via the
+> same-origin proxy `app/api/analyze` (`ML_API_URL`).
+
 ```
 LinkedIn Posts → Label Generation → Feature Engineering → Model Training → Prediction
      ↓              (VADER + Engagement)     ↓                    ↓
@@ -214,35 +222,39 @@ Supabase schema for logging requests/responses: `docs/supabase.sql` (table `anal
 
 ## AI provider abstraction (Vercel AI SDK)
 
-The Next.js AI layer uses the **Vercel AI SDK** (`ai` + `@ai-sdk/google` + `zod`),
-so the model provider is swappable. `lib/ai/provider.ts` exposes `getModel()`,
-which resolves a model from the `AI_MODEL` env var (default
-`google/gemini-2.0-flash`).
+The Next.js AI layer uses the **Vercel AI SDK** (`ai` + `@ai-sdk/openai` +
+`@ai-sdk/google` + `zod`), so the model provider is swappable. `lib/ai/provider.ts`
+exposes `getModel()`, which resolves a model from the `AI_MODEL` env var
+(default `openai/gpt-4o-mini`). The persona-critique / variant-eval client at
+`lib/google-ai/client.ts` (legacy dir name, now provider-agnostic) uses this
+resolver with `generateObject` + zod.
 
 How a model is resolved:
 - If `AI_GATEWAY_API_KEY` is set (or, on Vercel, OIDC enables the Gateway), the
   `provider/model` string is routed through the **Vercel AI Gateway**, which adds
   failover and cost tracking.
-- Otherwise it falls back to the `@ai-sdk/google` provider using a direct key.
-  The key is read from `GOOGLE_GENERATIVE_AI_API_KEY` (preferred), with
-  `GEMINI_API_KEY` as a fallback.
+- Otherwise it falls back to a direct provider key. For the default
+  `openai/...` model the key is read from `OPENAI_API_KEY` (via `@ai-sdk/openai`).
+  A `google/...` model still works via `@ai-sdk/google`, reading
+  `GOOGLE_GENERATIVE_AI_API_KEY` (preferred), with `GEMINI_API_KEY` as a fallback.
 
-**Switching providers** is a one-line change: set `AI_MODEL` (e.g.
-`openai/gpt-4o-mini`) and supply the relevant provider key or use the Gateway.
+**Switching providers/models** is a one-line change: set `AI_MODEL` and supply the
+matching provider key (or use the Gateway).
 
 ```
-AI_MODEL=google/gemini-2.0-flash      # default
-AI_GATEWAY_API_KEY=...                # optional: route via Vercel AI Gateway
-GOOGLE_GENERATIVE_AI_API_KEY=...       # direct Google key (GEMINI_API_KEY is a fallback)
+AI_MODEL=openai/gpt-4o-mini           # default
+OPENAI_API_KEY=...                    # direct OpenAI key (used by the default model)
+AI_GATEWAY_API_KEY=...                # optional: route any provider/model via Vercel AI Gateway
+GOOGLE_GENERATIVE_AI_API_KEY=...      # only if you switch AI_MODEL to google/... (GEMINI_API_KEY is a fallback)
 ```
 
 ## Rate limiting
 
 Two independent limiters protect the public surface:
 
 **Next.js inbound limiter** (`lib/ratelimit.ts`) — an in-memory, per-client-IP
-limiter applied to all public POST routes: `/api/gemini`, `/api/predict`,
-`/api/analyze`, `/api/analyze-with-images`, `/api/ab-tests`, `/api/personas`,
+limiter applied to the public POST routes, including `/api/analyze`,
+`/api/gemini`, `/api/analyze-with-images`, `/api/ab-tests`, `/api/personas`, and
 `/api/drafts`. Over-limit requests get a `429` with a `Retry-After` header.
 Configure with:
 ```
@@ -256,10 +268,6 @@ There is also a separate **outbound** throttle on calls to the AI provider,
 configured with `GEMINI_RATE_LIMIT_MAX_REQUESTS` (default `15`) and
 `GEMINI_RATE_LIMIT_WINDOW_MS` (default `60000`).
 
-**FastAPI `/predict` limiter** (`api.py`) — a `slowapi` per-IP limit on
-`POST /predict`, returning `429` when exceeded. Configure with
-`RATE_LIMIT_PREDICT` (default `30/minute`).
-> Caveat: the default store is in-memory **per gunicorn worker**, so with N
-> workers the effective global limit is ~N× the configured value. Set
-> `RATELIMIT_STORAGE_URI` (e.g. `redis://host:6379/0`) for a consistent global
-> limit. See `RENDER_DEPLOY.md` for deployment details.
+The `services/ml_api` backend has no built-in limiter of its own; it is reached
+only through the Next.js proxy (`app/api/analyze`), so the inbound limiter above
+covers it. See `RENDER_DEPLOY.md` for deployment details.
diff --git a/RENDER_DEPLOY.md b/RENDER_DEPLOY.md
@@ -1,55 +1,58 @@
-# Deploying the PR Sentiment API to Render
+# Deploying the Lyra ML API to Render
 
-This is the FastAPI backend (`api.py` + `prediction_service.py`) that serves the
-trained LinkedIn PR sentiment classifier. The trained artifacts live in
-`output/` and are committed to the repo, so no external storage is needed.
+This is the FastAPI backend (`services/ml_api`) — a **provider-free TF-IDF
+recruiting-signal engine**. It needs no AI provider key and no external database:
+the trained model artifacts live in `output/models/` (committed to the repo) and
+request/response logging uses local sqlite.
 
 ## What ships
 
 | File | Purpose |
 |------|---------|
 | `render.yaml` | Render Blueprint — defines the web service, build/start commands, health check, env vars. |
-| `Procfile` | Same start command, for non-Blueprint / generic buildpack deploys. |
-| `runtime.txt` | Pins Python 3.12.3. |
-| `requirements_api.txt` | Python deps (scikit-learn pinned to **1.6.1** to match the pickled model). |
-| `output/*.pkl`, `output/*.npy` | Trained model + scaler + encoders. |
+| `services/ml_api/requirements.txt` | Python deps for the API. |
+| `output/models/*` | Trained TF-IDF models + metadata (`metadata.json`, `*.joblib`, `train_tfidf_matrix.npz`, …). |
+
+The Blueprint provisions a single web service named **`lyra-ml-api`** on the
+Render **free** plan, running:
+
+```
+gunicorn services.ml_api.main:app -k uvicorn.workers.UvicornWorker
+```
+
+(bound to `$PORT`, `${WEB_CONCURRENCY:-1}` workers, `/health` health check).
 
 ## One-time setup
 
-1. Push this branch to GitHub.
+1. Push this repo to GitHub.
 2. In Render: **New +** → **Blueprint** → select the repo. Render reads `render.yaml`.
-3. Set the **`GEMINI_API_KEY`** secret in the dashboard (it's `sync: false`, so it is
-   never stored in the repo). Get a key at https://aistudio.google.com/app/apikey.
-4. (Recommended) Set **`ALLOWED_ORIGINS`** to your frontend origin(s),
-   comma-separated, instead of `*`.
-5. Deploy. Render runs the health check against `/health`; the service only
-   reports healthy once the model has loaded.
+3. (Recommended) Set **`FRONTEND_ORIGIN`** to your deployed frontend origin so the
+   browser can reach the API directly if needed (the app normally calls it
+   server-side via `app/api/analyze`, so CORS rarely matters in production).
+4. Deploy. Render runs the health check against `/health`; the service only
+   reports healthy once the models have loaded.
+
+> No secrets are required — there is no AI provider key for this service.
 
 ## Environment variables
 
 | Var | Required | Default | Notes |
 |-----|----------|---------|-------|
-| `GEMINI_API_KEY` | ✅ | — | App refuses to start without it (fail-fast). |
-| `MODEL_DIR` | | `output` | Directory holding the `.pkl`/`.npy` artifacts. |
-| `ALLOWED_ORIGINS` | | `*` | Comma-separated origins. With `*`, credentials are disabled (CORS spec). |
-| `WEB_CONCURRENCY` | | `1` | gunicorn workers. Each worker loads the model — raise only after checking memory. |
-| `LOG_LEVEL` | | `INFO` | |
-| `RATE_LIMIT_PREDICT` | | `30/minute` | Per-IP limit on `POST /predict` (slowapi/limits syntax, e.g. `100/hour`, `5/second`). Over-limit requests get a 429. |
-| `RATELIMIT_STORAGE_URI` | | (in-memory) | Shared rate-limit store, e.g. `redis://host:6379/0`. Without it the store is per-process — see caveat below. |
+| `MODEL_DIR` | | `output/models` | Directory holding the trained TF-IDF artifacts. |
+| `FRONTEND_ORIGIN` | | `http://localhost:3000` | Frontend origin allowed by CORS. Mainly matters if the browser hits the API directly. |
+| `WEB_CONCURRENCY` | | `1` | gunicorn workers. Each worker loads the models — raise only after checking memory. |
 | `PORT` | (Render-injected) | `8000` | Bound automatically by the start command. |
-
-> **Rate-limit caveat:** the default store is in-memory **per gunicorn worker**, so
-> with `WEB_CONCURRENCY` = N the effective global limit is roughly N× the
-> configured `RATE_LIMIT_PREDICT`. Set `RATELIMIT_STORAGE_URI` to a shared Redis
-> instance for a single, consistent global limit across all workers and instances.
+| `PYTHON_VERSION` | | `3.12.3` | Pins the Python runtime for the build. |
 
 ## Verify after deploy
 
 ```bash
 curl https://<your-service>.onrender.com/health
-curl -X POST https://<your-service>.onrender.com/predict \
+# -> {"status":"ok","models_loaded":true}
+
+curl -X POST https://<your-service>.onrender.com/analyze \
   -H 'Content-Type: application/json' \
-  -d '{"text":"Excited to announce our new platform! #AI","has_media":1,"media_count":1}'
+  -d '{"post_text":"We are scaling our AI team fast. Expect late nights but huge impact."}'
 ```
 
 Interactive docs: `https://<your-service>.onrender.com/docs`
@@ -58,19 +61,19 @@ Interactive docs: `https://<your-service>.onrender.com/docs`
 
 ```bash
 python -m venv venv && source venv/bin/activate
-pip install -r requirements_api.txt
-export GEMINI_API_KEY=your-key
-python api.py            # dev server on :8000 (set PORT/RELOAD to override)
+pip install -r services/ml_api/requirements.txt
+# dev server on :8000 (reads PORT/MODEL_DIR from env)
+python -m services.ml_api.main
 # or, mirror production:
-gunicorn api:app -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000
+gunicorn services.ml_api.main:app -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000
 ```
 
 ## Wiring the Next.js frontend
 
 The frontend never calls the FastAPI service directly. Instead:
 
-- `app/sentiment-analyzer/page.tsx` POSTs to the same-origin route `/api/predict`.
-- `app/api/predict/route.ts` forwards the request to `${ML_API_URL}/predict`.
+- The app POSTs to the same-origin route `app/api/analyze`.
+- `app/api/analyze` forwards the request to `${ML_API_URL}/analyze`.
 
 So you only set **one** env var on the Next.js host (e.g. Vercel):
 
@@ -79,16 +82,5 @@ ML_API_URL=https://<your-service>.onrender.com
 ```
 
 Locally, `ML_API_URL` defaults to `http://localhost:8000`. Run both together
-with `npm run dev` (starts `next dev` + `uvicorn api:app` via `concurrently`),
-which also needs `GEMINI_API_KEY` exported for the Python side.
-
-## Model caveat (read before demoing)
-
-The model currently in `output/` is the **full-embedding (768-dim) classifier**
-— the PCA/regularization fixes described in `FIXES_APPLIED.md` were *documented
-but never saved* (`pca_reducer.pkl` is absent, and the saved model reports 784
-input features = 768 embeddings + 16 metadata). It therefore still carries the
-documented overfitting (~84% train / ~45% test). The serving pipeline is
-correct and dimensionally consistent; if you re-run the notebook to actually
-apply PCA, save `pca_reducer.pkl` into `output/` and the service will pick it up
-automatically (it already branches on the file's presence).
+with `npm run dev` (starts `next dev` + `uvicorn services.ml_api.main:app` via
+`concurrently`).