Skip to content

tarantua/quiver

Repository files navigation

Quiver

Quiver

Quiver is a web application for exploring datasets with DuckDB, LLM-assisted KPI SQL, a visual canvas, natural-language chat, and multi-agent collective simulation. The primary experience is the React + Vite frontend talking to a FastAPI backend; an older Streamlit UI remains optional.


What the web app does

  1. Upload tabular and document files — besides CSV, the API accepts Excel (.xlsx/.xls), Parquet, JSON / JSONL, TSV, and documents for LLM context: PDF, DOCX, TXT, Markdown, RST, and plain-text RTF. The app infers schemas, row counts, and join hints for tabular data; documents return previews and feed insight KPIs (text-based) alongside SQL KPIs.
  2. Session cache — after analyze, the API returns a session_id. The client can send it on later requests so the backend can reload parsed frames and document text from Redis instead of re-uploading (see app/session_store.py). If Redis is off or the session expired, upload files again as before.
  3. Analyze automatically — after you attach files, the client calls /api/analyze so schema-aware features (KPIs, collective simulation, SQL editor hints) are ready without a separate “Analyze” click.
  4. Work on an infinite canvasReact Flow renders draggable KPI cards (generated or custom SQL) and optional schema cards per table. Cards can be removed with the corner control. The UI avoids duplicate cards where it matters: one collective flow lives in the side panel (not on canvas); schema nodes are deduped by table index; KPI SQL is deduped by normalized duckdb_sql.
  5. KPI merge — connect two KPI cards with a merge edge; the backend can combine them into one SQL KPI (/api/kpis/merge). Merge suggestions (/api/kpis/merge-suggestions) propose pairs; suggested links appear on the canvas when data and KPIs allow.
  6. Generate KPI SQL — from the bottom command bar, describe what you want; the model returns named KPIs with DuckDB SQL (and document insight KPIs when non-tabular context exists). Each SQL KPI can run the query and show a Recharts-based chart where applicable.
  7. SQL editor — full-screen modal with Schema | Editor | Viewer. Schema lists csv_data_0, csv_data_1, … (upload order), column types, and shortcuts to copy view names or insert a sample SELECT. You can Add as KPI to place the current query on the canvas as a card.
  8. Collective simulation — opened from the header (not as a canvas node). A slide-in panel runs the same pipeline as before: persona design → per-agent DuckDB tool loops → collective synthesis. Progress uses Server-Sent Events (/api/collective/run/stream) so the Activity stepper updates in real time (prepare → agents → each simulation → synthesis → result).
  9. Ask Quiver — modal chat over your files with read-only SQL tools (same provider/model settings as KPIs where applicable). The API applies a scope guard before LLM calls to refuse obvious off-topic or jailbreak-style prompts; answers should stay grounded in your uploaded data.
  10. Explore correlation — after analyze, open join hints between tables in a dedicated modal.
  11. Save dashboards — when PostgreSQL (or Supabase) is configured, you can save, list, and delete named dashboards with their KPI cards via /api/dashboards (see header / save dialog in the UI). Without Postgres, dashboard APIs persist against the default dev SQLite file (quiver_dev.db at repo root).
  12. Bring Your Keys — store provider, model, and API keys in the browser (saved locally); the backend forwards them to OpenAI-compatible, Anthropic, Ollama, or custom base URLs according to your selection.
  13. Data quality APIPOST /api/analyze/quality returns per-column metrics (null rates, outliers, duplicates, etc.) and an overall score for tabular uploads. Integrate or call directly as needed.

Architecture (high level)

Layer Role
Frontend (frontend/) React 19, Vite, Tailwind, Radix/shadcn-style components, @xyflow/react for the canvas, Recharts for KPI charts. Proxies /api to the backend in dev. Optional Clerk sign-in (VITE_CLERK_PUBLISHABLE_KEY).
Backend (backend/main.py) FastAPI: multipart uploads, DuckDB in-memory views per request, LLM routes, rate limiting, URL validation, security headers, optional Sentry and structlog JSON logging.
Shared logic (app/) Schema inference (schema_infer), join hints, DuckDB helpers, KPI/collective/chat engines, collective_run_events.py (SSE), session_store.py (Redis cache), data_quality.py, document_extract.py, chat_scope_guard.py, kpi_merge*.py, auth (auth.py), DB (db.py, db_models.py), optional Supabase Storage mirrors.

DuckDB naming: uploaded tabular files become temp views csv_data_0, csv_data_1, … in upload order. SQL in the editor and in generated KPIs must use those view names (the schema panel documents them).


API overview (REST)

Method Path Purpose
GET /api/health Liveness
GET /api/health/ready Readiness: checks Postgres when DATABASE_URL is PostgreSQL; otherwise skips DB check
POST /api/ollama/warmup Optional Ollama model warmup
POST /api/analyze Multipart files or session_id → tables, columns, previews, join hints, document previews, new session_id
POST /api/analyze/quality Same inputs as analyze → per-table quality reports + scores
POST /api/kpis/generate Form: provider, model, keys, session_id, files, … → KPI list (sql / insight)
POST /api/kpis/merge Merge two KPI definitions into one SQL KPI
POST /api/kpis/merge-suggestions Suggest mergeable KPI pairs for the canvas
POST /api/query/execute Run DuckDB SQL on the current upload set (read-oriented)
POST /api/chat Data chat with tool loop
POST /api/collective/run Full collective run; JSON result
POST /api/collective/run/stream Same pipeline as SSE (step / complete / error)
POST /api/dashboards Create dashboard + KPI cards
GET /api/dashboards List dashboards for the authenticated user
DELETE /api/dashboards/{dashboard_id} Delete a dashboard

Detailed request/response shapes follow the Pydantic models in backend/main.py and the frontend src/lib/api.ts.


Prerequisites

  • Python 3.10+ (recommended)
  • Node.js 18+ and npm (for the web UI)
  • Redis (optional but recommended): default redis://localhost:6379 — speeds up follow-up requests via session_id. Set SESSION_STORE_ENABLED=0 to disable without Redis.
  • PostgreSQL (optional): for production-like dashboard persistence and org/user rows; local dev can use the built-in SQLite fallback (sqlite+aiosqlite:///./quiver_dev.db).
  • For local LLMs: Ollama running (default base URL http://127.0.0.1:11434) and a pulled model (e.g. ollama pull llama3.2)
  • For cloud providers: API keys via Bring Your Keys or server environment variables (see .env.example)

Install

From the repository root:

pip install -r requirements.txt -r backend/requirements.txt
cd frontend && npm install && cd ..

For PostgreSQL or Supabase, apply migrations from the repo root after setting DATABASE_URL:

alembic upgrade head

Run (web app)

Local API without Clerk JWT

By default the API verifies Clerk JWTs (CLERK_AUTH_ENABLED=1). If you are not using Clerk yet, set in .env:

CLERK_AUTH_ENABLED=0

That uses a fixed dev user for dashboard ownership and logs. With Clerk enabled, configure the frontend publishable key and send Authorization: Bearer <token> from the client (the wired React app does this when Clerk is configured).

Option A — one command (recommended)

python scripts/dev.py

Starts uvicorn on http://127.0.0.1:8000 and Vite on http://localhost:5173. On Windows you can use dev.bat; Git Bash: bash scripts/dev.sh.

Stop with Ctrl+C (on Windows you may need Ctrl+C twice if a child process does not exit cleanly).

Option B — two terminals

Terminal 1 — API

uvicorn backend.main:app --reload --host 127.0.0.1 --port 8000

Terminal 2 — UI

cd frontend
npm run dev

Open http://localhost:5173.

Docker Compose (full stack)

The repo includes docker-compose.yml: backend (Gunicorn), frontend (Nginx + SPA), Redis, and PostgreSQL. Copy .env.example.env, then:

docker compose up --build

Frontend is exposed on port 3000 by default; backend on PORT or 8000. You can point DATABASE_URL at Supabase instead of the bundled Postgres; see comments in docker-compose.yml.

Production build (frontend)

Set VITE_API_URL to your public API origin (no trailing slash) if the UI is not served behind the same host/path as /api:

cd frontend
npm run build
npm run preview

See frontend/README.md for a short deployment checklist (CORS, TLS, bundle size, Clerk).


Configuration

Copy .env.example to .env in the repo root (never commit secrets). Vite also loads env from the repo root; frontend-only keys can live in frontend/.env (see frontend/.env.example).

Database & persistence

  • DATABASE_URL — async SQLAlchemy URL, e.g. postgresql+asyncpg://user:pass@host/db (falls back to SQLite if unset)
  • SUPABASE_DB_URL — used if DATABASE_URL is empty
  • DB_SSL, DB_POOL_SIZE, DB_MAX_OVERFLOW, DB_ECHO_SQL — pool and TLS tuning
  • UPLOAD_SESSION_PERSIST=1 — audit upload sessions in Postgres when enabled

Sessions & cache

  • REDIS_URL — default redis://localhost:6379
  • SESSION_STORE_ENABLED, SESSION_TTL_SECONDS

Auth (Clerk)

  • CLERK_AUTH_ENABLED — set 0 for local dev without JWTs
  • CLERK_JWKS_URL or CLERK_PUBLISHABLE_KEY / VITE_CLERK_PUBLISHABLE_KEY on the server for JWKS discovery

Supabase Storage (optional mirrors)

  • SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY, SUPABASE_STORAGE_BUCKET

App & security

  • ALLOWED_ORIGINS — comma-separated frontend origins for CORS in production
  • TRUSTED_HOSTS — optional Host header allow list (reverse proxies)
  • OLLAMA_BASE_URL, OLLAMA_TRUST_CLIENT_URL
  • RELAX_LLM_URL_VALIDATION — allow LAN/local OpenAI-compatible URLs (still blocks metadata endpoints)
  • MAX_UPLOAD_BYTES_PER_FILE, MAX_CSV_FILES_PER_REQUEST
  • EXPOSE_INTERNAL_ERRORS — set only when debugging
  • RATE_LIMIT_* — optional in-memory rate limits; RATE_LIMIT_TRUST_X_FORWARDED_FOR behind proxies

Observability

  • SENTRY_DSN, SENTRY_TRACES_SAMPLE_RATE, APP_ENV, APP_VERSION
  • LOG_LEVEL — when structlog is not installed, standard logging uses this

Provider keys (server-side fallback)

  • OPENAI_API_KEY, ANTHROPIC_API_KEY, etc. (BYOK from the UI is usually enough for development)

Security notes (summary)

  • Client-supplied LLM base URLs are validated to reduce SSRF; see app/url_security.py and .env.example for relax flags when using local/LAN OpenAI-compatible servers.
  • Bring Your Keys sends keys from the browser to your backend; use HTTPS in production.
  • Upload size and file count are capped server-side.
  • Ask Quiver uses server-side scope checks before LLM calls (app/chat_scope_guard.py); this complements a strict data-chat system prompt.
  • Responses include baseline security headers (e.g. X-Content-Type-Options, X-Frame-Options).

Streamlit (legacy)

The original Streamlit app is still available for experimentation:

pip install -r requirements.txt
streamlit run streamlit_app.py

Day-to-day development and the feature set described above target the React + FastAPI stack.


Repository layout (selected)

korn/
├── backend/main.py              # FastAPI routes, middleware, lifespan (DB tables)
├── app/                         # Core domain
│   ├── auth.py                  # Clerk JWT (optional)
│   ├── db.py, db_models.py      # Async SQLAlchemy
│   ├── session_store.py         # Redis session cache
│   ├── upload_persistence.py  # Optional Postgres audit rows
│   ├── supabase_storage.py      # Optional Storage uploads
│   ├── document_extract.py      # PDF/DOCX/text extraction
│   ├── data_quality.py          # Quality metrics
│   ├── chat_scope_guard.py      # Data-chat scope checks
│   ├── kpi_merge.py             # KPI merge + suggestions helpers
│   ├── collective_run_events.py # SSE step iterator
│   └── …
├── alembic/                     # DB migrations
├── frontend/src/
│   ├── App.tsx                  # Canvas, command bar, modals, merge edges
│   ├── components/              # FlowNodes, SqlEditorModal, SaveDashboardDialog, …
│   └── lib/api.ts               # Typed fetch + collective stream reader
├── scripts/dev.py               # Dev orchestration
├── docker-compose.yml           # Backend + frontend + Redis + Postgres
└── streamlit_app.py             # Legacy UI

Troubleshooting

Symptom Things to check
401 Unauthorized on API CLERK_AUTH_ENABLED=0 for dev without Clerk, or sign in and ensure the client sends Authorization: Bearer …
Blank API errors Backend running on 8000; Vite proxy; VITE_API_URL in production builds
Collective panel stuck Network tab for /api/collective/run/stream; provider keys and model id
SQL editor “no schema” Wait for auto-analyze after upload; open editor after analyze succeeds
Ollama errors OLLAMA_BASE_URL, model pulled, warmup optional from analyze flow
Dashboard save fails Postgres/SQLite connectivity; run alembic upgrade head for Postgres; check server logs
Session / re-upload every time Redis running and SESSION_STORE_ENABLED=1; or expect full re-upload if cache miss

For deeper behavior, trace from frontend/src/lib/api.ts and backend/main.py into app/.

About

Figma for Data Exploration.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors