Sci‑fi RAG console powered by local Ollama models. Switch between multiple GGUF quantizations, retrieve context from a prebuilt HNSW index, and chat with a cursed starship Oracle.
- Switch models at runtime via dropdown (no page reload)
- RAG over local HNSW index (cached retriever)
- Streaming responses with live Oracle bubble
- Per‑reply latency display; model switch divider entries
- Status endpoint indicating model name and RAG state
- Node 18+
- Ollama running locally at
http://localhost:11434 - Pulled models (examples not in this repo):
gemma-2-2b-it-Q4_K_Mgemma-3-1b-it-Q3_K_Lphi3-mini-q3kl
npm installnpm run dev
# A warmup helper will hit / and /api/status automaticallyOpen http://localhost:3000.
- The header dropdown lists available models (from
Oracle_Config.ts: AVAILABLE_CHAT_MODELS). - Selecting a model updates the server‑side registry and subsequent chats use it.
- A divider appears in chat:
SWITCHED TO <model> in <ms>.
- Index path:
rag_data/(seeFAISS_PATH) - Retriever is warmed on first access and cached
GET /api/status→{ model, rag, availableModels }POST /api/model { model }→ switch active modelPOST /api/chat→ streams Oracle response
Edit Oracle_Config.ts for:
AVAILABLE_CHAT_MODELSDEFAULT_CHAT_MODEL_NAMEEMBEDDING_MODEL_NAMEOLLAMA_BASE_URL
- Scrollbars and dropdown styled for dark sci‑fi theme