GitHub - ICAST-Research-Project/art-connect-rag-framework: Image Pipeline & RAG Backend API

Quickstart

# 1) Clone the repository
git clone https://github.com/ICAST-Research-Project/backend-rag-api.git

# 2) Enter the project folder
cd backend-rag-api

# 3) Create a virtual environment (local, isolated Python env)
python -m venv .venv

# 4) Activate the virtual environment
source .venv/bin/activate

# 5) Install all Python dependencies listed in requirements.txt
python -m pip install -r requirements.txt

# 6) Run the FastAPI Service
uvicorn app.main:app --reload

Tech Stack

FastAPI + Uvicorn: Python web framework and ASGI server
PostgreSQL (Neon) + pgvector: relational data + vector search
OpenAI + (CLIP/OpenCLIP): chat + image/text embeddings
ElevenLabs — TTS / voice replies
Clerk — JWT auth for API protection
AWS S3 — object storage + presigned uploads
Pydantic — config & schema validation
SQLAlchemy — ORM & sessions

Routes: Chat (`app/routers/chat.py`)

Endpoint: POST /api/chat
Auth: Authorization: Bearer <Clerk JWT> (required)
Does: Runs the RAG pipeline and returns an answer plus supporting sources. Persists the turn as channel="text".

Routes: Chat History (`app/routers/chat_history.py`)

Base prefix: /api/chat
Auth: Authorization: Bearer <Clerk JWT> (required)

`GET /api/chat/history`

Returns a paginated list of the user’s scan chats with:

scan_id, scan_title
artwork_id, artwork_title
artwork_image_url (auto-resolved to a usable URL)
created_at, last_message, last_message_at

Route: Image Embedding (`app/routers/embed.py`)

Endpoint: POST /api/embed-image
Does: Fetches an image from a URL, computes its embedding, and returns the vector + dimension.

Implementation Notes

Loads the image via app.helpers.common.load_image_from_url(...).
Embeds with app.embeddings.embed_pil_image(...).
Embedding model/dimension come from your embeddings module (see IMAGE_EMBEDDING_MODEL in app/config.py).

Route: Image Search (`app/routers/search.py`)

Endpoint: POST /api/search-image
Auth: Authorization: Bearer <Clerk JWT> (required)
Uploads: multipart/form-data with an image file field named file
Does: Embeds the uploaded image, searches nearest artworks (pgvector), applies confidence/margin logic, persists a scan, and returns top-K neighbors + a scan_id.

Query Params

top_k : number of results to return (default from TOPK_DEFAULT, 1–100)
metric : similarity metric: cosine | l2 | ip (default METRIC_DEFAULT)
sim_threshold : min similarity for a match (default IMAGE_MATCH_SIM_THRESHOLD)
margin_threshold : min absolute margin (top1 - top2) when require_margin=true
require_margin : enforce margin checks (true/false, default IMAGE_MATCH_REQUIRE_MARGIN)
solo_threshold : min similarity to accept when only one candidate is present
high_conf_threshold : auto-accept if top1 >= high_conf_threshold
margin_ratio_threshold : min ratio (top1 / top2) when margin is enforced

Route: Voice Chat (`app/routers/voice.py`)

Endpoint: POST /api/voice/chat
Auth: Authorization: Bearer <Clerk JWT> (required)
Does: Converts speech to text (ASR), runs RAG chat, then returns TTS audio (base64) + transcript + text answer. Persists the turn as channel="voice".

Form Fields

artwork_id (string, required) : target artwork context
artist_id (string, optional)
scan_id (string, optional) : tie to existing scan/session
prompt (string, optional) : text prompt when no audio is sent
audio_file (file, optional) : audio if present, ASR will produce transcript
voice_id (string, optional) : TTS voice (provider-specific)
metric (string, optional) : cosine | l2 | ip (default: METRIC_DEFAULT)
sim_threshold (float, optional) : [0,1] (default: TEXT_MATCH_SIM_THRESHOLD)
top_k (int, optional) : retrieval candidates (default: 6)
language_code (string, optional) : ASR language hint (e.g., en, en-US)

Auth: Clerk JWT (`app/auth_clerk.py`)

What it does:
Validates Clerk-issued JWTs on incoming requests and returns the current user ({"user_id": sub, "claims": ...}) for protected routes.

Headers supported

Authorization: Bearer <JWT>
X-Client-Auth: Bearer <JWT> (fallback for mobile/web clients)

Env vars

CLERK_ISSUER (required) : https://art-connect.org.clerk.accounts.dev

How it works

Caches Clerk JWKS for 5 minutes to verify signatures.
Verifies iss (issuer) and, if provided, aud (audience).
Extracts sub : returned as user_id for downstream usage.
Raises 401 on missing/bad bearer or invalid token.

Config (`app/config.py`)

Central place for env-driven settings (loaded via python-dotenv).
These control retrieval metrics, table/column names, thresholds, and model IDs.

What it does

Reads .env and exposes constants used across routers/helpers.
Sets defaults so the API can run locally without a huge .env.
Groups knobs for image search, text RAG, and DB schema.

Key Env Vars (with defaults)

Models

MODEL_ID — image model id for local usage (default: openai/clip-vit-base-patch32)
TEXT_EMBED_MODEL — text embedding model (default: text-embedding-3-small)
GEN_MODEL — chat/generation model (default: gpt-4o-mini)
OPENAI_API_KEY — required for OpenAI models

Database / Vectors

DATABASE_URL — Postgres connection string
PGVECTOR_PROBES — index probe count for ANN searches (default: 10)

Image Embeddings (table/cols)

IMAGE_EMBED_TABLE (default: artwork_embeddings_image)
IMAGE_EMBED_VECTOR_COL (default: embedding)
IMAGE_EMBED_ARTWORK_ID_COL (default: artwork_id)

Retrieval Controls (shared)

METRIC_DEFAULT — cosine | l2 | ip (default: cosine)
TOPK_DEFAULT — default top-K (default: 5)

Image Match Thresholds

IMAGE_MATCH_SIM_THRESHOLD (default: 0.70)
IMAGE_MATCH_MARGIN_THRESHOLD (default: 0.10)
IMAGE_MATCH_REQUIRE_MARGIN (true|false, default: true)
IMAGE_MATCH_SOLO_THRESHOLD (default: 0.80)
IMAGE_MATCH_HIGH_CONF_THRESHOLD (default: 0.90)
IMAGE_MATCH_MARGIN_RATIO_THRESHOLD (default: 1.05)

Domain Tables (read-joins)

ARTWORKS_TABLE (default: Artwork)
ARTWORKS_ID_COL (default: id)
ARTWORKS_ARTIST_ID_COL (default: artistId)
ARTISTS_TABLE (default: Artist)
ARTISTS_ID_COL (default: id)
ARTISTS_NAME_COL (default: name)
ARTWORKS_TITLE_COL (default: title)
ARTWORKS_DESC_COL (default: description)
ARTISTS_BIO_COL (default: bio)

Text Embeddings (RAG)

TEXT_EMBED_TABLE_ARTWORK (default: artwork_embeddings_text)
TEXT_EMBED_TABLE_ARTIST (default: artist_embeddings_text)
TEXT_EMBED_VECTOR_COL (default: embedding)
TEXT_EMBED_TEXT_COL (default: content)
TEXT_MATCH_SIM_THRESHOLD (default: 0.60)

Operator mapping (pgvector):
cosine → <=>, l2 → <->, ip → <#>; code converts distance : similarity for thresholds.

Chat DB Pool (`app/db_chat.py`)

What it does:
Provides a psycopg connection pool and a get_chat_cursor() context manager for chat-related queries (scans/messages).

Env var

CHAT_DATABASE_URL (required) : Postgres connection string for the chat database.

App DB Pool (`app/db.py`)

What it does:
Creates a psycopg connection pool for the primary app database and exposes:

get_cursor() : pooled cursor with pgvector registered and ivfflat.probes set
get_conn() : direct connection context (bypasses pool)

Env vars

DATABASE_URL (required) : Postgres connection string
PGVECTOR_PROBES : ANN probe count for ivfflat (default from app/config.py)

Embeddings (`app/embeddings.py`)

What it does

Image embeddings (CLIP): Loads a CLIP model once, embeds a PIL image, L2-normalizes features, and returns a Python list of floats.
Text embeddings (OpenAI): Calls OpenAI Embeddings. Returns a list of floats.

LLM Wrapper (`app/llm.py`)

What it does

Small helper around OpenAI Chat Completions to turn question + RAG context (+ optional history) into an answer that stays grounded.
System Prompts

App Entry (`app/main.py`)

What it does

Creates the FastAPI app, configures CORS, exposes health & identity endpoints.
Registers feature routers: embed, search, chat, voice, chat history.

Endpoints (no global /api prefix)

GET / — status payload (name, version, python, server_time_utc, uptime, git_sha?)
GET /healthz — { "ok": true }
GET /whoami — returns { "user_id": ... } (requires Clerk auth)
Routers included
- POST /embed-image — image to embedding
- POST /search-image — image search (pgvector)
- POST /chat — text chat (RAG)
- POST /voice/chat — voice chat (ASR to RAG to TTS)
- GET /chat/history — chat list
- GET /chat/{scan_id}/messages — messages for a scan

CORS

Reads ALLOW_ORIGINS, splits by comma, and enables:
- allow_credentials=True, allow_methods="*", allow_headers="*"

Schemas (`app/schemas.py`)

Pydantic models that define request/response shapes for the API.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github/workflows		.github/workflows
app		app
scripts		scripts
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Readme.md		Readme.md
compose.yml		compose.yml
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.prod.yml		docker-compose.prod.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quickstart

Tech Stack

Routes: Chat (`app/routers/chat.py`)

Routes: Chat History (`app/routers/chat_history.py`)

`GET /api/chat/history`

Route: Image Embedding (`app/routers/embed.py`)

Route: Image Search (`app/routers/search.py`)

Route: Voice Chat (`app/routers/voice.py`)

Form Fields

Auth: Clerk JWT (`app/auth_clerk.py`)

Config (`app/config.py`)

What it does

Key Env Vars (with defaults)

Chat DB Pool (`app/db_chat.py`)

App DB Pool (`app/db.py`)

Embeddings (`app/embeddings.py`)

LLM Wrapper (`app/llm.py`)

App Entry (`app/main.py`)

Schemas (`app/schemas.py`)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quickstart

Tech Stack

Routes: Chat (app/routers/chat.py)

Routes: Chat History (app/routers/chat_history.py)

GET /api/chat/history

Route: Image Embedding (app/routers/embed.py)

Route: Image Search (app/routers/search.py)

Route: Voice Chat (app/routers/voice.py)

Form Fields

Auth: Clerk JWT (app/auth_clerk.py)

Config (app/config.py)

What it does

Key Env Vars (with defaults)

Chat DB Pool (app/db_chat.py)

App DB Pool (app/db.py)

Embeddings (app/embeddings.py)

LLM Wrapper (app/llm.py)

App Entry (app/main.py)

Schemas (app/schemas.py)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Routes: Chat (`app/routers/chat.py`)

Routes: Chat History (`app/routers/chat_history.py`)

`GET /api/chat/history`

Route: Image Embedding (`app/routers/embed.py`)

Route: Image Search (`app/routers/search.py`)

Route: Voice Chat (`app/routers/voice.py`)

Auth: Clerk JWT (`app/auth_clerk.py`)

Config (`app/config.py`)

Chat DB Pool (`app/db_chat.py`)

App DB Pool (`app/db.py`)

Embeddings (`app/embeddings.py`)

LLM Wrapper (`app/llm.py`)

App Entry (`app/main.py`)

Schemas (`app/schemas.py`)

Packages