Skip to content

ICAST-Research-Project/art-connect-rag-framework

Repository files navigation

Quickstart

# 1) Clone the repository
git clone https://github.com/ICAST-Research-Project/backend-rag-api.git
# 2) Enter the project folder
cd backend-rag-api
# 3) Create a virtual environment (local, isolated Python env)
python -m venv .venv
# 4) Activate the virtual environment
source .venv/bin/activate
# 5) Install all Python dependencies listed in requirements.txt
python -m pip install -r requirements.txt
# 6) Run the FastAPI Service
uvicorn app.main:app --reload

Tech Stack

  • FastAPI + Uvicorn: Python web framework and ASGI server
  • PostgreSQL (Neon) + pgvector: relational data + vector search
  • OpenAI + (CLIP/OpenCLIP): chat + image/text embeddings
  • ElevenLabs — TTS / voice replies
  • Clerk — JWT auth for API protection
  • AWS S3 — object storage + presigned uploads
  • Pydantic — config & schema validation
  • SQLAlchemy — ORM & sessions

Routes: Chat (app/routers/chat.py)

Endpoint: POST /api/chat
Auth: Authorization: Bearer <Clerk JWT> (required)
Does: Runs the RAG pipeline and returns an answer plus supporting sources. Persists the turn as channel="text".

Routes: Chat History (app/routers/chat_history.py)

Base prefix: /api/chat
Auth: Authorization: Bearer <Clerk JWT> (required)

GET /api/chat/history

Returns a paginated list of the user’s scan chats with:

  • scan_id, scan_title
  • artwork_id, artwork_title
  • artwork_image_url (auto-resolved to a usable URL)
  • created_at, last_message, last_message_at

Route: Image Embedding (app/routers/embed.py)

Endpoint: POST /api/embed-image
Does: Fetches an image from a URL, computes its embedding, and returns the vector + dimension.

Implementation Notes

  • Loads the image via app.helpers.common.load_image_from_url(...).
  • Embeds with app.embeddings.embed_pil_image(...).
  • Embedding model/dimension come from your embeddings module (see IMAGE_EMBEDDING_MODEL in app/config.py).

Route: Image Search (app/routers/search.py)

Endpoint: POST /api/search-image
Auth: Authorization: Bearer <Clerk JWT> (required)
Uploads: multipart/form-data with an image file field named file
Does: Embeds the uploaded image, searches nearest artworks (pgvector), applies confidence/margin logic, persists a scan, and returns top-K neighbors + a scan_id.

Query Params

  • top_k : number of results to return (default from TOPK_DEFAULT, 1–100)
  • metric : similarity metric: cosine | l2 | ip (default METRIC_DEFAULT)
  • sim_threshold : min similarity for a match (default IMAGE_MATCH_SIM_THRESHOLD)
  • margin_threshold : min absolute margin (top1 - top2) when require_margin=true
  • require_margin : enforce margin checks (true/false, default IMAGE_MATCH_REQUIRE_MARGIN)
  • solo_threshold : min similarity to accept when only one candidate is present
  • high_conf_threshold : auto-accept if top1 >= high_conf_threshold
  • margin_ratio_threshold : min ratio (top1 / top2) when margin is enforced

Route: Voice Chat (app/routers/voice.py)

Endpoint: POST /api/voice/chat
Auth: Authorization: Bearer <Clerk JWT> (required)
Does: Converts speech to text (ASR), runs RAG chat, then returns TTS audio (base64) + transcript + text answer. Persists the turn as channel="voice".

Form Fields

  • artwork_id (string, required) : target artwork context
  • artist_id (string, optional)
  • scan_id (string, optional) : tie to existing scan/session
  • prompt (string, optional) : text prompt when no audio is sent
  • audio_file (file, optional) : audio if present, ASR will produce transcript
  • voice_id (string, optional) : TTS voice (provider-specific)
  • metric (string, optional) : cosine | l2 | ip (default: METRIC_DEFAULT)
  • sim_threshold (float, optional) : [0,1] (default: TEXT_MATCH_SIM_THRESHOLD)
  • top_k (int, optional) : retrieval candidates (default: 6)
  • language_code (string, optional) : ASR language hint (e.g., en, en-US)

Auth: Clerk JWT (app/auth_clerk.py)

What it does:
Validates Clerk-issued JWTs on incoming requests and returns the current user ({"user_id": sub, "claims": ...}) for protected routes.

Headers supported

  • Authorization: Bearer <JWT>
  • X-Client-Auth: Bearer <JWT> (fallback for mobile/web clients)

Env vars

  • CLERK_ISSUER (required) : https://art-connect.org.clerk.accounts.dev

How it works

  • Caches Clerk JWKS for 5 minutes to verify signatures.
  • Verifies iss (issuer) and, if provided, aud (audience).
  • Extracts sub : returned as user_id for downstream usage.
  • Raises 401 on missing/bad bearer or invalid token.

Config (app/config.py)

Central place for env-driven settings (loaded via python-dotenv).
These control retrieval metrics, table/column names, thresholds, and model IDs.

What it does

  • Reads .env and exposes constants used across routers/helpers.
  • Sets defaults so the API can run locally without a huge .env.
  • Groups knobs for image search, text RAG, and DB schema.

Key Env Vars (with defaults)

Models

  • MODEL_ID — image model id for local usage (default: openai/clip-vit-base-patch32)
  • TEXT_EMBED_MODEL — text embedding model (default: text-embedding-3-small)
  • GEN_MODEL — chat/generation model (default: gpt-4o-mini)
  • OPENAI_API_KEY — required for OpenAI models

Database / Vectors

  • DATABASE_URL — Postgres connection string
  • PGVECTOR_PROBES — index probe count for ANN searches (default: 10)

Image Embeddings (table/cols)

  • IMAGE_EMBED_TABLE (default: artwork_embeddings_image)
  • IMAGE_EMBED_VECTOR_COL (default: embedding)
  • IMAGE_EMBED_ARTWORK_ID_COL (default: artwork_id)

Retrieval Controls (shared)

  • METRIC_DEFAULTcosine | l2 | ip (default: cosine)
  • TOPK_DEFAULT — default top-K (default: 5)

Image Match Thresholds

  • IMAGE_MATCH_SIM_THRESHOLD (default: 0.70)
  • IMAGE_MATCH_MARGIN_THRESHOLD (default: 0.10)
  • IMAGE_MATCH_REQUIRE_MARGIN (true|false, default: true)
  • IMAGE_MATCH_SOLO_THRESHOLD (default: 0.80)
  • IMAGE_MATCH_HIGH_CONF_THRESHOLD (default: 0.90)
  • IMAGE_MATCH_MARGIN_RATIO_THRESHOLD (default: 1.05)

Domain Tables (read-joins)

  • ARTWORKS_TABLE (default: Artwork)
  • ARTWORKS_ID_COL (default: id)
  • ARTWORKS_ARTIST_ID_COL (default: artistId)
  • ARTISTS_TABLE (default: Artist)
  • ARTISTS_ID_COL (default: id)
  • ARTISTS_NAME_COL (default: name)
  • ARTWORKS_TITLE_COL (default: title)
  • ARTWORKS_DESC_COL (default: description)
  • ARTISTS_BIO_COL (default: bio)

Text Embeddings (RAG)

  • TEXT_EMBED_TABLE_ARTWORK (default: artwork_embeddings_text)
  • TEXT_EMBED_TABLE_ARTIST (default: artist_embeddings_text)
  • TEXT_EMBED_VECTOR_COL (default: embedding)
  • TEXT_EMBED_TEXT_COL (default: content)
  • TEXT_MATCH_SIM_THRESHOLD (default: 0.60)

Operator mapping (pgvector):
cosine → <=>, l2 → <->, ip → <#>; code converts distance : similarity for thresholds.

Chat DB Pool (app/db_chat.py)

What it does:
Provides a psycopg connection pool and a get_chat_cursor() context manager for chat-related queries (scans/messages).

Env var

  • CHAT_DATABASE_URL (required) : Postgres connection string for the chat database.

App DB Pool (app/db.py)

What it does:
Creates a psycopg connection pool for the primary app database and exposes:

  • get_cursor() : pooled cursor with pgvector registered and ivfflat.probes set
  • get_conn() : direct connection context (bypasses pool)

Env vars

  • DATABASE_URL (required) : Postgres connection string
  • PGVECTOR_PROBES : ANN probe count for ivfflat (default from app/config.py)

Embeddings (app/embeddings.py)

What it does

  • Image embeddings (CLIP): Loads a CLIP model once, embeds a PIL image, L2-normalizes features, and returns a Python list of floats.
  • Text embeddings (OpenAI): Calls OpenAI Embeddings. Returns a list of floats.

LLM Wrapper (app/llm.py)

What it does

  • Small helper around OpenAI Chat Completions to turn question + RAG context (+ optional history) into an answer that stays grounded.
  • System Prompts

App Entry (app/main.py)

What it does

  • Creates the FastAPI app, configures CORS, exposes health & identity endpoints.
  • Registers feature routers: embed, search, chat, voice, chat history.

Endpoints (no global /api prefix)

  • GET / — status payload (name, version, python, server_time_utc, uptime, git_sha?)
  • GET /healthz{ "ok": true }
  • GET /whoami — returns { "user_id": ... } (requires Clerk auth)
  • Routers included
    • POST /embed-image — image to embedding
    • POST /search-image — image search (pgvector)
    • POST /chat — text chat (RAG)
    • POST /voice/chat — voice chat (ASR to RAG to TTS)
    • GET /chat/history — chat list
    • GET /chat/{scan_id}/messages — messages for a scan

CORS

  • Reads ALLOW_ORIGINS, splits by comma, and enables:
    • allow_credentials=True, allow_methods="*", allow_headers="*"

Schemas (app/schemas.py)

Pydantic models that define request/response shapes for the API.

About

Image Pipeline & RAG Backend API

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors