Skip to content

nu-bi/nubi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

175 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nubi logo

Nubi

BI that runs in the browser — near-zero cost per dashboard view.

License Apache-2.0 Tests passing PRs welcome Python FastAPI React 19 Vite Stars

Docs · Compare vs Hex/Cube · Quickstart · Architecture · Roadmap


Nubi query editor — SQL workspace with live results, query library, and one-click expose-as-metric


What is Nubi?

Nubi is a batteries-included BI and embedded-analytics platform. The structural bet is that the analytics kernel runs in the user's browser by default (DuckDB-WASM / Pyodide), so the marginal cost of a dashboard view is approximately zero — a server kernel (E2B / Modal Firecracker microVM) is only the escape hatch for native wheels and large jobs.

The data plane uses Arrow IPC at every boundary, so data moves between warehouse, edge, browser, and kernel with no serialization tax. The entry wedge is embedding: a host app signs short-lived JWTs, mounts <nubi-dashboard>, and gets live cross-filtering dashboards with server-enforced row-level security at near-zero cost per view.


✨ Why Nubi?

Hex Cube Nubi
Kernel Python per session, their cloud ($$$) n/a Pyodide in browser; on-demand server kernel only when needed
Result transport JSON via pandas JSON / SQL API Arrow IPC — zero serialization tax
Viz Plotly/SVG, chokes past ~50k rows bring-your-own WebGL/WebGPU on Arrow buffers, 1M+ points interactive
Caching Per-session Pre-aggregations in Cube Store Content-hashed edge cache + auto pre-aggregations
Modeling tax medium high (cubes first) low — point at a warehouse and go
Embedding separate product headless only core surface; editor embeddable, not just output
Free tier per-seat kernel billing infra/seat real free tier — compute is the user's browser

Key differentiators:

  • Arrow-native data plane — sqlglot planner → PhysicalPlan → executor → Arrow IPC stream, with a frozen cache-key spec and conformance suite so a future Rust executor can swap in without touching call sites.
  • Content-hashed edge cache — N viewers of the same dashboard collapse to one warehouse hit. Cache key: sha256(canonical_json({sql, params, rls_claims})).
  • Auth-as-code + server-side RLS — JWT claims carry row/column policies; the planner injects them as AST-level predicates (never string-concat). Powers internal users, multi-tenant embedding, and Google OAuth from the same primitive.
  • LLM-authorable dashboards + MCP — a dashboard is a sanitized HTML/CSS document of declarative <nubi-kpi>, <nubi-table>, and <nubi-chart> custom elements. LLMs and MCP agents author layout and widget attributes; they never write WebGL or fetch code. Fifteen MCP tools expose the full authoring surface to any agent.
  • Auto-WebGL rendering<nubi-chart> switches to a regl WebGL scatter path automatically above 20,000 rows; SVG/HTML below. Up to ~1M points at interactive framerates reading Arrow columns directly.
  • SQL-first connector SDK — any fn(plan) -> pyarrow.Table is a first-class connector with declared capabilities. The capability gate enforces the security floor: a connector with predicate_rls=False is refused (501) when policies are active. Built-in connectors: postgres (ADBC), duckdb (in-memory + file-backed), duckdb_storage (S3/R2/MinIO httpfs), http_json, mysql, mariadb, jdbc, snowflake, bigquery, clickhouse, databricks, athena, trino/presto, sqlserver/azuresql/azuresynapse, oracle, redshift, cockroachdb, cloudsql, sftp, ftp (most via optional lazy-imported drivers). Private databases reachable via a network_mode='bridge' WebSocket tunnel.
  • Real free tier — compute is the user's browser; Hex can't match it without absorbing kernel cost.

📸 Screenshots

The hero above is the query editor. A few more surfaces — see the UI tour for the full guided walkthrough.

Dashboard editor Dashboard editor — drag-and-drop widgets, 9 chart types, live cross-filtering Flows Flows — cell-based SQL/Python orchestration (notebook + canvas views)
Data browser Data browser — explore any connected source, edit grid-style Published dashboard Published dashboard — embeddable, RLS-enforced, near-zero cost per view

🚀 Quickstart

Docker Compose (fastest — one command)

The repo ships a docker-compose.yml with two services: db (postgres:16-alpine) and a combined app (root Dockerfile — builds the Vite SPA and runs FastAPI, serving the SPA and the /api/v1 API on a single origin at port 8000).

# 1. Clone and start the stack
git clone https://github.com/nu-bi/nubi.git
cd nubi
make up          # docker compose up -d --build

# 2. Open the app
#    App (SPA + API): http://localhost:8000
#    API docs:        http://localhost:8000/docs (dev only)

# 3. (Optional) seed a test user
cd backend && DATABASE_URL=postgresql://nubi:nubi@localhost:5432/nubi python seed.py
#    → test@nubi.dev / nubitest123

# 4. Smoke test
make smoke       # scripts/smoke.sh — health + auth + query assertions

The compose stack runs against a local Postgres container. To connect to Neon or another managed Postgres, set DATABASE_URL in your environment before running make up.

Dev path — backend + frontend separately

Prerequisites: Python 3.11+, Node 20+

# ── Backend ───────────────────────────────────────────────────
python3.11 -m venv .venv && source .venv/bin/activate
pip install -r backend/requirements.txt

# Copy and edit env — at minimum set DATABASE_URL and JWT_SECRET
cp .env.example backend/.env

# Run migrations, then start the API
python database/migrate.py
cd backend && uvicorn main:app --reload
# API:  http://localhost:8000
# Docs: http://localhost:8000/docs

# ── Frontend (new terminal, repo root) ────────────────────────
npm install
cp .env.example .env          # set VITE_BACKEND_URL=http://localhost:8000
npm run dev
# Frontend: http://localhost:5173

Seed a test user (optional, with the venv active):

cd backend && DATABASE_URL=postgresql://user:pass@host/db python seed.py
# → test@nubi.dev / nubitest123
Key environment variables (.env.example)
Variable Required Description
DATABASE_URL Yes postgresql://...?sslmode=require (Neon) or local Postgres
JWT_SECRET Yes HS256 signing secret — openssl rand -hex 32
VITE_BACKEND_URL Frontend Base URL of the FastAPI backend
GOOGLE_CLIENT_ID OAuth Google OAuth client ID
GOOGLE_CLIENT_SECRET OAuth Google OAuth client secret
GOOGLE_REDIRECT_URI OAuth Callback URL registered in Google Console
FRONTEND_URL Backend Where the backend redirects after Google OAuth
CORS_ORIGINS Backend Comma-separated allowed origins
ENV Backend development / production (disables /docs in prod)
KERNEL_LOCAL_ENABLED Backend true to allow local subprocess kernel (dev only, default true)
KERNEL_REMOTE_PROVIDER Backend e2b or modal for Firecracker/Modal sandboxed kernels (prod)
LLM_PROVIDER Optional litellm / anthropic / openai / gemini. litellm (one SDK, all providers + per-model cost tracking) reads LITELLM_MODEL; see AI docs. Unset ⇒ offline mode.
ALLOW_UNSAFE_PUBLIC_EXPORTS Optional true to enable no-auth CDN static exports (Mode 3b). Requires org public_exports gate. Default false.
EMBED_DEV_TOKEN_ENABLED Dev only true to enable the dev-only HS256 embed-token mint endpoint. Never true in production.
NUBI_COLLECT_ROW_CAP Optional Row cap for snapshot/report data collection. Default 50000. 0 = unlimited.
JOBS_SCHEDULER_ENABLED Optional true to activate the background job scheduler tick. Default false.
FLOWS_TICK_SECRET Optional Shared secret for POST /flows/tick (external cron schedulers). Leave empty to disable.
FX_EMERGENCY_RATE EE only Emergency fallback USD→ZAR rate when no live rate is available. Default 16.26.

🏗️ Architecture

flowchart TD
  subgraph client["Browser / host page"]
    direction TB
    DASH["&lt;nubi-dashboard&gt; + widget kit<br/>kpi · table · chart · filter"]
    WASM["DuckDB-WASM kernel<br/>regl WebGL render above ~20k rows"]
    DASH --- WASM
  end

  subgraph api["FastAPI backend"]
    direction TB
    AUTH["Auth · email+pw · Google OAuth · JWKS"]
    QUERY["/query"]
    PLAN["Planner — sqlglot AST → PhysicalPlan<br/>injects row-level-security predicates"]
    CACHE["Content-hashed cache<br/>X-Nubi-Cache: HIT | MISS"]
    REG["Connector registry"]
    SVC["/ai · /lineage · /jobs · REST CRUD"]
    KR["/compute/run · kernel router"]
    QUERY --> PLAN --> CACHE --> REG
  end

  subgraph data["Data sources — bring your own"]
    direction TB
    WH["Warehouses<br/>postgres · duckdb · http_json<br/>mysql · snowflake · bigquery"]
    BRIDGE["VPC bridge · WebSocket tunnel"]
    META["Metadata DB<br/>Postgres / Neon"]
  end

  subgraph compute["Compute kernel · first-party only · embed → 403"]
    direction TB
    LOCAL["LocalSubprocessRunner · dev"]
    REMOTE["E2B / Modal · Firecracker microVM"]
  end

  DASH -->|"getToken() → JWT"| AUTH
  DASH -->|"HTTPS"| QUERY
  DASH -.-> SVC
  CACHE -->|"Arrow IPC stream"| WASM
  REG -->|"Arrow IPC"| WH
  REG --> BRIDGE --> WH
  REG --> META
  KR --> LOCAL
  KR --> REMOTE

  classDef client fill:#eff6ff,stroke:#3b82f6,color:#1e3a8a;
  classDef secure fill:#fef2f2,stroke:#ef4444,color:#991b1b;
  class client client;
  class compute secure;
Loading

Tech stack

Layer Technologies
Backend FastAPI 0.131, Python 3.11+, uvicorn, pydantic-settings v2
DB asyncpg (connection pool, raw SQL); Postgres 16 / Neon (SSL required)
Auth argon2-cffi (argon2id), PyJWT HS256, cryptography RS256/ES256 JWKS
Data plane sqlglot (AST planner + RLS injection + dialect validation), pyarrow, DuckDB (in-mem + file), adbc-driver-postgresql; mysql/mariadb/jdbc connectors (optional drivers); VPC bridge tunnel
Cache In-process LRU + TTL (ContentAddressedCache); interface is Redis-swappable
Compute subprocess (dev); e2b-code-interpreter / modal (prod, lazy optional deps)
AI / LLM NullProvider (default, zero network); LiteLLM in-process (recommended — one SDK, all providers + per-call cost tracking); lazy Anthropic / OpenAI / Gemini native SDKs via env
Frontend React 19, Vite 7, TailwindCSS, react-router-dom
Viz regl (WebGL scatter, ~1M pts), apache-arrow, @duckdb/duckdb-wasm, ECharts
Embed Custom elements (<nubi-dashboard>, <nubi-kpi>, <nubi-table>, <nubi-chart>), DOMPurify
SDK @nubi/sdk — framework-agnostic ESM, wraps auth + query + resource CRUD + embed
CLI Python typer (nubi login / deploy / run / diff / pull)
MCP Python mcp SDK, stdio transport, 15 tools
Self-host Docker Compose (docker-compose.yml); Makefile: make up/down/migrate/smoke

Monorepo layout

nubi/
├── backend/          FastAPI app, connectors, planner, compute, auth, AI, jobs
│   ├── app/
│   │   ├── auth/     argon2id, JWT HS256, Google PKCE, JWKS, sessions
│   │   ├── connectors/ sqlglot planner, Arrow executor, cache, pre-agg
│   │   ├── compute/  KernelRunner ABC, LocalSubprocessRunner, E2BRunner, ModalRunner
│   │   ├── ai/       LLMProvider, grounding, dashboard generation
│   │   ├── lineage/  sqlglot AST extractor, LineageGraph
│   │   ├── jobs/     cron + interval scheduler, executor, store
│   │   ├── repos/    asyncpg (prod) + in-memory (test) repository layer
│   │   └── routes/   auth, query, compute, embed, ai, lineage, jobs, resources
│   └── tests/        ~180 test modules + conformance suite (golden Arrow + cache keys) + security/ suite
├── database/         Forward-only SQL migration runner + 13 OSS migrations + 4 EE migrations
├── src/              React 19 frontend (Vite + Tailwind) — pages, components, viz
├── embed/            Web components: <nubi-dashboard>, <nubi-kpi>, <nubi-table>, <nubi-chart>
├── sdk/              @nubi/sdk — createNubiClient ESM package
├── cli/              nubi CLI (typer): login / deploy / run / diff / pull
├── mcp/              MCP stdio server — 15 tools for agent authoring
├── docs/             cache-key-spec.md, conformance.md, kernel-security.md, assets/
├── Dockerfile          combined image: Vite SPA build + FastAPI (single origin)
├── docker-compose.yml   db (postgres:16) + app (SPA + API on :8000)
├── Makefile          up / down / migrate / logs / smoke
├── scripts/smoke.sh  End-to-end health + auth + query assertions
└── .env.example      All env vars with comments

📊 Project status

Milestone Status What shipped
M0 — Foundation ✅ Done React + FastAPI rebuild on Neon Postgres, email/pw + Google OAuth, migrations
M1 — Connectors + conformance ✅ Done sqlglot planner, PhysicalPlan, Postgres/DuckDB connectors, frozen cache-key spec
M2 — Streaming + cache + pushdown ✅ Done Arrow IPC stream, content-hashed LRU cache, projection/predicate/LIMIT pushdown, pre-agg seed
M3 — Embed auth + <nubi-dashboard> ✅ Done HS256 + JWKS verifier, issuer registry, server-side RLS, origin pinning, web component
M4 — Local kernel + placement router ✅ Done KernelRunner ABC, LocalSubprocessRunner, ComputePlacementRouter, POST /compute/run
M4-REMOTE — E2B/Modal sandbox ✅ Done E2BRunner (Firecracker microVM), ModalRunner adapter
M5 — WebGL viz ✅ Done regl GPU scatter on Arrow buffers, <nubi-chart> auto-WebGL above 20k rows
M6 — REST API + SDK + CLI ✅ Done asyncpg repo layer, CRUD for datastores/boards/widgets/queries, @nubi/sdk, typer CLI
M7 — Lineage + AI + MCP ✅ Done sqlglot lineage extractor, deterministic grounding, LLMProvider, MCP server (15 tools)
M8 — LLM-authorable dashboards ✅ Done <nubi-kpi>, <nubi-table>, <nubi-chart> widget kit, DOMPurify renderer, POST /ai/dashboard
M9 — Connector SDK + HTTP/JSON ✅ Done FunctionConnector, apply_rls_postfetch, HttpJsonConnector, NoSQL deliberately out of scope
Connector breadth ✅ Done Registry ships 20+ types: postgres, duckdb (in-mem + file-backed), duckdb_storage (S3/R2/MinIO httpfs), http_json, mysql, mariadb, jdbc, snowflake, bigquery, clickhouse, databricks, athena, trino/presto, sqlserver/azuresql/azuresynapse, oracle, redshift, cockroachdb, cloudsql, sftp, ftp (most via optional lazy-imported drivers)
VPC bridge ✅ Done network_mode='bridge' opens a WebSocket TCP tunnel via BridgeBroker, wired into the query path (resolve_network_async); other modes 501
Builder layer (M13–M22) ✅ Done Query workspace + typed params, filter/variable/route-param interactivity, TanStack table + conditional formatting, 9 chart types, exports, scheduled reports, AI-SQL, agentic chat, git sync
Unified editor surfaces ✅ Done EditorShell: Dashboard / Report / Presentation surface switch; DocCanvas (A4/Letter paginated doc) + SlideCanvas (16:9 slides + present mode); spec.surfaces.{grid,report,slides} schema split live
M10 — Docker self-host smoke test 🔄 In progress docker-compose.yml ships locally (db + combined app on :8000); live-infra CI smoke test is the remaining capstone
M11 — Scheduled jobs ✅ Done cron + interval scheduler (deterministic now), execute_job, CRUD + run-now + run-history routes
M12 — Capability-gated RLS ✅ Done connector resolution via datastore.config.type, 501 gate when predicate_rls=False + active policies

Tests: ~180 backend test modules (incl. security/ suite) + conformance suite (golden Arrow output + byte-identical cache keys), MCP tests, CLI tests, dashboard sanitizer (node --test), SDK tests.

Experimental / not production-hardened: LocalSubprocessRunner (dev-grade isolation — same OS user, host network); Docker Compose stack not yet smoke-tested against live external infra (Neon SSL, E2B, real Google OAuth).


🔌 Embedding quickstart

<!-- 1. Load the widget bundle -->
<script type="module" src="https://cdn.example.com/nubi-dashboard.js"></script>

<!-- 2. Mount the component — calls getToken() before each query -->
<nubi-dashboard
  get-token="getToken"
  query="demo_sales_by_region"
  backend="https://api.example.com"
></nubi-dashboard>

CSS custom properties control theming: --nubi-bg, --nubi-fg, --nubi-accent, --nubi-border.

Full embed integration steps

1. Register your issuer in app/auth/issuers.py:

{
  "iss": "https://your-app.example.com",
  "jwks_uri": "https://your-app.example.com/.well-known/jwks.json",
  "aud": "nubi:your-project-id",
  "allowed_origins": ["https://your-app.example.com"],
}

2. Mint short-lived JWTs (≤15 min, RS256 or ES256) from your backend:

// Reference: embed/getToken.reference.js
async function getToken() {
  const { token } = await fetch('/your-api/nubi-token').then(r => r.json())
  return token  // signed JWT from your backend
}
window.getToken = getToken

Required JWT claims: iss, sub, aud, org, project, roles[], scope[] (must include "read:*" or narrower), policies (RLS column-value pairs), embed_origin, exp (≤ now + 900), iat.

3. The component handles the rest — JWKS verification, RLS enforcement, Arrow IPC fetch, WebGL rendering.


🧪 Running tests

# Backend — in-memory repo + DuckDB fixtures; no live DB required
cd backend && pytest

# MCP server tests
cd mcp && pytest tests/

# Dashboard sanitizer (Node built-in runner)
npm run test:dash

# JS SDK tests
cd sdk && node --test src/index.test.mjs

# CLI tests
cd cli && pytest tests/

The backend conformance suite (backend/tests/conformance/) asserts the planner produces golden Arrow output and byte-identical cache keys. A future Rust executor must pass the same suite to be swappable.


📦 SDKs & tooling

Package Path Description
@nubi/sdk sdk/ Framework-agnostic ESM — .auth, .query(), .resources.*, .embed.mount()
nubi CLI cli/ login / deploy / run / diff / pull — with --dry-run
MCP server mcp/ stdio MCP — 15 tools for agent dashboard authoring
Embed bundle embed/ <nubi-dashboard> + widget kit custom elements

📖 Documentation

Full documentation lives in docs/start at the documentation index. Highlights:

Using Nubi

Platform & security

Build & contribute


🤝 Contributing

PRs are welcome. Start with the contributor guides — they cover the dev stack, seeding, every test suite, and the docs/screenshot pipeline:

The fastest path:

  1. Fork, create a feature branch.
  2. Run the test suite (cd backend && python -m pytest tests/).
  3. If your change touches the UI or anything described in docs/, update the docs and run npm run screenshots in the same PR.
  4. Open a PR — describe the problem and solution; reference any relevant milestone or doc.

Please keep commits small and focused. The conformance suite must stay green; any new connector or planner change needs a corresponding test vector.


License

Apache License 2.0 — see the LICENSE file.

About

Browser-native, embeddable BI — dashboards run on DuckDB-WASM over an Apache Arrow data plane for ~zero cost per view, with one-tag embedding, JWT auth, and server-enforced row-level security. LLM/MCP-authorable. Open-core.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors