# 1) Clone the repository
git clone https://github.com/ICAST-Research-Project/backend-rag-api.git# 2) Enter the project folder
cd backend-rag-api# 3) Create a virtual environment (local, isolated Python env)
python -m venv .venv# 4) Activate the virtual environment
source .venv/bin/activate# 5) Install all Python dependencies listed in requirements.txt
python -m pip install -r requirements.txt# 6) Run the FastAPI Service
uvicorn app.main:app --reload- FastAPI + Uvicorn: Python web framework and ASGI server
- PostgreSQL (Neon) + pgvector: relational data + vector search
- OpenAI + (CLIP/OpenCLIP): chat + image/text embeddings
- ElevenLabs — TTS / voice replies
- Clerk — JWT auth for API protection
- AWS S3 — object storage + presigned uploads
- Pydantic — config & schema validation
- SQLAlchemy — ORM & sessions
Endpoint: POST /api/chat
Auth: Authorization: Bearer <Clerk JWT> (required)
Does: Runs the RAG pipeline and returns an answer plus supporting sources. Persists the turn as channel="text".
Base prefix: /api/chat
Auth: Authorization: Bearer <Clerk JWT> (required)
Returns a paginated list of the user’s scan chats with:
scan_id,scan_titleartwork_id,artwork_titleartwork_image_url(auto-resolved to a usable URL)created_at,last_message,last_message_at
Endpoint: POST /api/embed-image
Does: Fetches an image from a URL, computes its embedding, and returns the vector + dimension.
Implementation Notes
- Loads the image via
app.helpers.common.load_image_from_url(...). - Embeds with
app.embeddings.embed_pil_image(...). - Embedding model/dimension come from your embeddings module (see
IMAGE_EMBEDDING_MODELinapp/config.py).
Endpoint: POST /api/search-image
Auth: Authorization: Bearer <Clerk JWT> (required)
Uploads: multipart/form-data with an image file field named file
Does: Embeds the uploaded image, searches nearest artworks (pgvector), applies confidence/margin logic, persists a scan, and returns top-K neighbors + a scan_id.
Query Params
top_k: number of results to return (default fromTOPK_DEFAULT, 1–100)metric: similarity metric:cosine|l2|ip(defaultMETRIC_DEFAULT)sim_threshold: min similarity for a match (defaultIMAGE_MATCH_SIM_THRESHOLD)margin_threshold: min absolute margin(top1 - top2)whenrequire_margin=truerequire_margin: enforce margin checks (true/false, defaultIMAGE_MATCH_REQUIRE_MARGIN)solo_threshold: min similarity to accept when only one candidate is presenthigh_conf_threshold: auto-accept iftop1 >= high_conf_thresholdmargin_ratio_threshold: min ratio(top1 / top2)when margin is enforced
Endpoint: POST /api/voice/chat
Auth: Authorization: Bearer <Clerk JWT> (required)
Does: Converts speech to text (ASR), runs RAG chat, then returns TTS audio (base64) + transcript + text answer. Persists the turn as channel="voice".
- artwork_id (string, required) : target artwork context
- artist_id (string, optional)
- scan_id (string, optional) : tie to existing scan/session
- prompt (string, optional) : text prompt when no audio is sent
- audio_file (file, optional) : audio if present, ASR will produce
transcript - voice_id (string, optional) : TTS voice (provider-specific)
- metric (string, optional) :
cosine|l2|ip(default:METRIC_DEFAULT) - sim_threshold (float, optional) :
[0,1](default:TEXT_MATCH_SIM_THRESHOLD) - top_k (int, optional) : retrieval candidates (default:
6) - language_code (string, optional) : ASR language hint (e.g.,
en,en-US)
What it does:
Validates Clerk-issued JWTs on incoming requests and returns the current user ({"user_id": sub, "claims": ...}) for protected routes.
Headers supported
Authorization: Bearer <JWT>X-Client-Auth: Bearer <JWT>(fallback for mobile/web clients)
Env vars
CLERK_ISSUER(required) :https://art-connect.org.clerk.accounts.dev
How it works
- Caches Clerk JWKS for 5 minutes to verify signatures.
- Verifies
iss(issuer) and, if provided,aud(audience). - Extracts
sub: returned asuser_idfor downstream usage. - Raises
401on missing/bad bearer or invalid token.
Central place for env-driven settings (loaded via python-dotenv).
These control retrieval metrics, table/column names, thresholds, and model IDs.
- Reads
.envand exposes constants used across routers/helpers. - Sets defaults so the API can run locally without a huge .env.
- Groups knobs for image search, text RAG, and DB schema.
Models
MODEL_ID— image model id for local usage (default:openai/clip-vit-base-patch32)TEXT_EMBED_MODEL— text embedding model (default:text-embedding-3-small)GEN_MODEL— chat/generation model (default:gpt-4o-mini)OPENAI_API_KEY— required for OpenAI models
Database / Vectors
DATABASE_URL— Postgres connection stringPGVECTOR_PROBES— index probe count for ANN searches (default:10)
Image Embeddings (table/cols)
IMAGE_EMBED_TABLE(default:artwork_embeddings_image)IMAGE_EMBED_VECTOR_COL(default:embedding)IMAGE_EMBED_ARTWORK_ID_COL(default:artwork_id)
Retrieval Controls (shared)
METRIC_DEFAULT—cosine|l2|ip(default:cosine)TOPK_DEFAULT— default top-K (default:5)
Image Match Thresholds
IMAGE_MATCH_SIM_THRESHOLD(default:0.70)IMAGE_MATCH_MARGIN_THRESHOLD(default:0.10)IMAGE_MATCH_REQUIRE_MARGIN(true|false, default:true)IMAGE_MATCH_SOLO_THRESHOLD(default:0.80)IMAGE_MATCH_HIGH_CONF_THRESHOLD(default:0.90)IMAGE_MATCH_MARGIN_RATIO_THRESHOLD(default:1.05)
Domain Tables (read-joins)
ARTWORKS_TABLE(default:Artwork)ARTWORKS_ID_COL(default:id)ARTWORKS_ARTIST_ID_COL(default:artistId)ARTISTS_TABLE(default:Artist)ARTISTS_ID_COL(default:id)ARTISTS_NAME_COL(default:name)ARTWORKS_TITLE_COL(default:title)ARTWORKS_DESC_COL(default:description)ARTISTS_BIO_COL(default:bio)
Text Embeddings (RAG)
TEXT_EMBED_TABLE_ARTWORK(default:artwork_embeddings_text)TEXT_EMBED_TABLE_ARTIST(default:artist_embeddings_text)TEXT_EMBED_VECTOR_COL(default:embedding)TEXT_EMBED_TEXT_COL(default:content)TEXT_MATCH_SIM_THRESHOLD(default:0.60)
Operator mapping (pgvector):
cosine → <=>,l2 → <->,ip → <#>; code converts distance : similarity for thresholds.
What it does:
Provides a psycopg connection pool and a get_chat_cursor() context manager for chat-related queries (scans/messages).
Env var
CHAT_DATABASE_URL(required) : Postgres connection string for the chat database.
What it does:
Creates a psycopg connection pool for the primary app database and exposes:
get_cursor(): pooled cursor withpgvectorregistered andivfflat.probessetget_conn(): direct connection context (bypasses pool)
Env vars
DATABASE_URL(required) : Postgres connection stringPGVECTOR_PROBES: ANN probe count forivfflat(default fromapp/config.py)
What it does
- Image embeddings (CLIP): Loads a CLIP model once, embeds a PIL image, L2-normalizes features, and returns a Python list of floats.
- Text embeddings (OpenAI): Calls OpenAI Embeddings. Returns a list of floats.
What it does
- Small helper around OpenAI Chat Completions to turn question + RAG context (+ optional history) into an answer that stays grounded.
- System Prompts
What it does
- Creates the FastAPI app, configures CORS, exposes health & identity endpoints.
- Registers feature routers: embed, search, chat, voice, chat history.
Endpoints (no global /api prefix)
GET /— status payload (name,version,python,server_time_utc,uptime,git_sha?)GET /healthz—{ "ok": true }GET /whoami— returns{ "user_id": ... }(requires Clerk auth)- Routers included
POST /embed-image— image to embeddingPOST /search-image— image search (pgvector)POST /chat— text chat (RAG)POST /voice/chat— voice chat (ASR to RAG to TTS)GET /chat/history— chat listGET /chat/{scan_id}/messages— messages for a scan
CORS
- Reads
ALLOW_ORIGINS, splits by comma, and enables:allow_credentials=True,allow_methods="*",allow_headers="*"
Pydantic models that define request/response shapes for the API.