Lake

Lake is the data analytics platform for DoubleZero. It provides a web interface and API for querying network telemetry and Solana validator data stored in ClickHouse.

Components

api/

HTTP API server that powers the web UI. Provides endpoints for:

SQL query execution against ClickHouse
AI-powered natural language to SQL generation
Conversational chat interface for data analysis
MCP server for Claude Desktop and other MCP clients
Schema catalog and visualization recommendations

Serves the built web UI as static files in production.

web/

React/TypeScript single-page application. Features:

SQL editor with syntax highlighting
Natural language query interface
Chat mode for conversational data exploration
Query results with tables and charts
Session history

agent/

LLM-powered workflow for answering natural language questions. Implements a multi-step process: classify → decompose → generate SQL → execute → synthesize answer. Includes evaluation tests for validating agent accuracy.

See agent/README.md for architecture details.

indexer/

Background service that continuously syncs data from external sources into ClickHouse:

Network topology from Solana (DZ programs)
Latency measurements from Solana (DZ programs)
Device usage metrics from InfluxDB
Solana validator data from mainnet
GeoIP enrichment from MaxMind

See indexer/README.md for architecture details.

slack/

Slack bot that provides a chat interface for data queries. Users can ask questions in Slack and receive answers powered by the agent workflow.

admin/

CLI tool for maintenance operations:

Database reset
Data backfills (latency, usage metrics)
Schema migrations

migrations/

ClickHouse schema migrations for dimension and fact tables. These are applied automatically by the indexer on startup.

utils/

Shared Go packages used across lake services (logging, retry logic, test helpers).

Data Flow

External Sources              Lake Services              Storage
────────────────              ─────────────              ───────

Solana (DZ) ───────────────► Indexer ──────────────────► ClickHouse
InfluxDB    ───────────────►    │
MaxMind     ───────────────►    │
                                │
                                ▼
                    ┌───────────────────────┐
                    │      API Server       │◄────── Web UI
                    │  • Query execution    │◄────── Slack Bot
                    │  • Agent workflow     │
                    │  • Chat interface     │
                    └───────────────────────┘

Development

Local Setup

Run the setup script to get started:

./scripts/dev-setup.sh

This will:

Start Docker services (ClickHouse, PostgreSQL, Neo4j)
Create .env from .env.example
Download GeoIP databases

Then start the services in separate terminals:

# Terminal 1: Run the mainnet indexer (imports data into ClickHouse)
go run ./indexer/cmd/indexer/ --verbose --migrations-enable

# Optional: run additional environment indexers (each in its own terminal)
go run ./indexer/cmd/indexer/ --dz-env devnet --migrations-enable --create-database --listen-addr :3011
go run ./indexer/cmd/indexer/ --dz-env testnet --migrations-enable --create-database --listen-addr :3012

# Terminal 2: Run the API server
go run ./api/main.go

# Terminal 3: Run the web dev server
cd web
bun install
bun dev

The web app will be at http://localhost:5173, API at http://localhost:8080.

Running Agent Evals

The agent has evaluation tests that validate the natural language to SQL workflow. Run them with:

./scripts/run-evals.sh                 # Run all evals in parallel
./scripts/run-evals.sh --show-failures # Show failure logs at end
./scripts/run-evals.sh -s              # Short mode (code validation only, no API)
./scripts/run-evals.sh -r 2            # Retry failed tests up to 2 times

Output goes to eval-runs/<timestamp>/ - check failures.log for any failures.

Deployment

Lake uses automated CI/CD via GitHub Actions and ArgoCD.

Automatic Staging Deploys

Pushes to staging branches automatically build and deploy:

Build web assets and upload to S3
Build Docker image and push to ghcr.io/malbeclabs/lake
Tag image as staging (ArgoCD picks up changes automatically)

Current staging branches are configured in .github/workflows/release.docker.lake.yml.

PR Previews

Add the preview-lake label to a PR to trigger a preview build. Assets go to a branch-prefixed location in the preview bucket.

Promoting to Production

To promote a staging image to production:

Via GitHub Actions (recommended):

Go to Actions → "promote.lake" workflow
Run workflow with source_tag=staging and target_tag=prod

Via CLI:

./scripts/promote-to-prod.sh           # staging → prod (prompts for confirmation)
./scripts/promote-to-prod.sh -n        # dry-run, show what would happen
./scripts/promote-to-prod.sh -y        # skip confirmation
./scripts/promote-to-prod.sh main prod # promote specific tag

ArgoCD will automatically sync the new image.

Static Asset Fallback

The API server fetches missing static assets from S3 to handle rolling deployments gracefully. When users have cached HTML referencing old JS/CSS bundles, the API fetches those assets from S3 instead of returning 404s.

Configure with:

ASSET_BUCKET_URL=https://my-bucket.s3.amazonaws.com/assets

Environment

Key dependencies:

ClickHouse - Analytics database
Anthropic API - LLM for natural language features
InfluxDB (optional) - Device usage metrics source
MaxMind GeoIP - IP geolocation databases

MCP Server

The API exposes an MCP (Model Context Protocol) server at /api/mcp for use with Claude Desktop and other MCP clients.

Tools

Tool	Description
`execute_sql`	Run SQL queries against ClickHouse
`execute_cypher`	Run Cypher queries against Neo4j (topology, paths)
`get_schema`	Get database schema (tables, columns, types)
`read_docs`	Read DoubleZero documentation

Claude Desktop

Open Settings → Manage Connectors
Click "Add Custom Connector"
Enter URL: https://data.malbeclabs.com/api/mcp

Claude Code / Cursor

Add a .mcp.json file to your project:

{
  "mcpServers": {
    "doublezero": {
      "type": "http",
      "url": "https://data.malbeclabs.com/api/mcp"
    }
  }
}

Authentication

Lake supports user authentication with daily usage limits.

Tier	Auth Method	Daily Limit
Domain users	Google OAuth (allowed domains)	Unlimited
Wallet users	Solana wallet (SIWS)	50 questions
Anonymous	IP-based	5 questions

Configure with GOOGLE_CLIENT_ID, VITE_GOOGLE_CLIENT_ID, and AUTH_ALLOWED_DOMAINS environment variables. See .env.example for details.

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
.claude/commands		.claude/commands
.github		.github
admin		admin
agent		agent
api		api
docs		docs
indexer		indexer
scripts		scripts
slack/bot		slack/bot
utils/pkg		utils/pkg
web		web
.env.example		.env.example
.gitignore		.gitignore
.golangci.yaml		.golangci.yaml
.mcp.json		.mcp.json
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lake

Components

api/

web/

agent/

indexer/

slack/

admin/

migrations/

utils/

Data Flow

Development

Local Setup

Running Agent Evals

Deployment

Automatic Staging Deploys

PR Previews

Promoting to Production

Static Asset Fallback

Environment

MCP Server

Tools

Claude Desktop

Claude Code / Cursor

Authentication

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

malbeclabs/lake

Folders and files

Latest commit

History

Repository files navigation

Lake

Components

api/

web/

agent/

indexer/

slack/

admin/

migrations/

utils/

Data Flow

Development

Local Setup

Running Agent Evals

Deployment

Automatic Staging Deploys

PR Previews

Promoting to Production

Static Asset Fallback

Environment

MCP Server

Tools

Claude Desktop

Claude Code / Cursor

Authentication

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages