Data Observability Starter Kit for Small Teams
Quickstart β’ Features β’ Architecture β’ Adding Checks β’ Alert Setup β’ Contributing
A self-hosted, Docker-Compose-ready observability layer that gives small data teams the 5 core observability pillars β Freshness, Volume, Quality, Schema Drift, and Pipeline Health β without needing a paid platform like Monte Carlo or Metaplane.
- 1β5 person data teams at seed/Series-A startups
- Teams using Airflow or Prefect for orchestration
- Teams using dbt for transformations
- Warehouses: PostgreSQL, BigQuery, or Snowflake
- Pain: pipelines breaking silently, dashboards going stale, no single alert channel
- Zero vendor lock-in β everything runs on open-source infra you control
- Plug-in, don't replace β works alongside existing Airflow/dbt setups; no DAG refactoring required
- Opinionated but minimal β ships with sensible defaults; quickstart in under 10 minutes
- Progressive complexity β each observability layer is independent; adopt what you need
Detects stale tables by tracking max(updated_at) and comparing against your SLA thresholds.
Tracks row counts per table per DAG run with Z-score anomaly detection against a 7-day rolling average.
Ships with pre-built Soda Core and Great Expectations templates for null checks, duplicates, value ranges, and referential integrity.
Snapshots information_schema and diffs against previous snapshots. Detects added/removed columns and type changes.
Pulls Airflow/Prefect metrics via REST API and OpenTelemetry. Pre-built Grafana dashboards for success rates, task durations, and SLA misses.
Tracks Snowflake compute credits and BigQuery bytes billed natively, preventing runaway dashboard queries and exploding ETL costs.
Parses run_results.json and manifest.json directly into ObservaKit's Postgres database, eliminating the need for third-party dbt packages like Elementary.
| Layer | Tool |
|---|---|
| Data Quality | Soda Core + Great Expectations |
| dbt Observability | Native run_results.json parser |
| Pipeline Metrics | OpenTelemetry + Prometheus |
| Dashboards | Grafana |
| Backend API | FastAPI + SQLAlchemy |
| Metadata Store | PostgreSQL |
| Orchestration | Airflow / Prefect REST API |
| Containerisation | Docker Compose |
| Alerting | Slack webhooks, Email (SMTP) |
- Docker + Docker Compose
- Python 3.10+
- A supported SQL warehouse (PostgreSQL, BigQuery, or Snowflake)
git clone https://github.com/willowvibe/ObservaKit.git
cd ObservaKitcp .env.example .env
# Edit .env with your warehouse credentials and Airflow URLdocker-compose up -dTo instantly see ObservaKit in action without hooking up your own database, generate 7 days of simulated history and inject data anomalies (like schema drift and volume drops):
make demo(Once run, the dashboards will immediately populate with simulated pipeline failures and data quality alerts).
Visit http://localhost:3000 (default: admin / admin)
Dashboards are auto-provisioned under the Data Observability folder.
Visit http://localhost:8000/docs for the interactive Swagger UI.
cp checks/templates/soda/no_nulls_on_pk.yml checks/my_project/orders.yml
# Edit the YAML to point to your tableChecks run every hour by default. Override in config/kit.yml.
When migrating from legacy on-prem to a Cloud Lakehouse (e.g., Postgres to Snowflake), run ObservaKit in parallel to guarantee zero schema drift and 100% volume parity.
- Connect ObservaKit to both source and destination.
- Catch unsupported data type mappings early.
- Ensure every single row makes it across. This turns ObservaKit into an automated audit layer for complex data migrations.
Instantly identify silent failures, stale dashboards, and missing SLA targets. Future releases will include native integration for Cost Observability (e.g., Snowflake compute credits and BigQuery bytes billed).
flowchart TD
subgraph Orchestration
A[Airflow / Prefect]
end
subgraph Warehouse
B[(PostgreSQL / BigQuery / Snowflake)]
end
subgraph Kit - Backend
C[FastAPI Service]
D[(Metadata Store - Postgres)]
E[Scheduler - APScheduler]
F[Schema Diff Engine]
G[Volume Anomaly Detector]
H[Freshness Poller]
end
subgraph Quality
I[Soda Core / Great Expectations]
J[Native dbt parser]
end
subgraph Observability Stack
K[OpenTelemetry Collector]
L[Prometheus]
M[Grafana Dashboards]
end
subgraph Alerts
N[Slack / Email / PagerDuty]
end
A -- REST API / OTel --> K
B -- SQL queries --> H
B -- SQL queries --> G
B -- information_schema --> F
B -- check execution --> I
J -- JSON artifacts --> D
I -- results --> C
C --> D
E --> C
K --> L
L --> M
C -- Prometheus metrics --> L
C -- alert trigger --> N
ObservaKit/
βββ docker-compose.yml
βββ .env.example
βββ config/
β βββ kit.yml
β βββ warehouses/
βββ checks/
β βββ templates/
β βββ examples/
βββ backend/
β βββ main.py
β βββ models.py
β βββ scheduler.py
β βββ routers/
βββ landing-page/ <-- Vite/React GitHub Pages site
βββ dbt_integration/ <-- Native parsing logic for dbt artifacts
βββ connectors/
βββ alerts/
βββ otel/
βββ prometheus/
βββ grafana/
β βββ dashboards/
β βββ provisioning/
βββ tests/
βββ docs/
Contributions welcome! Please read the guidelines before opening a PR.
- Add a new warehouse connector
- Add a Grafana dashboard for a new use case
- Write a quality check template for a common schema
- Improve documentation or quickstart clarity
MIT License β free to use, modify, and distribute.
Built by WillowVibe DataSynapse β AI-first data enablement for modern teams.