AI-first data engineering. Built in India. Deployed everywhere.
We build open-source infrastructure tools that give small data teams enterprise-grade
observability, tooling, and automation — without vendor lock-in.
WillowVibe DataSynapse is a data engineering & AI tooling studio focused on:
- Data Observability — monitoring pipeline health, data freshness, and schema drift so teams catch silent failures before dashboards go stale
- FinOps for Data — tracking Snowflake compute credits and BigQuery bytes billed, turning cloud cost chaos into actionable visibility
- AI-Augmented Pipelines — embedding AI at the right layer of the data stack without replacing what already works
- Open-Source First — every internal tool we build, we ship as OSS so the community benefits and we build in the open
We are a solo + contributor model — lean by design, moving fast, building things that matter.
Self-hosted Data Observability & FinOps Starter Kit for small data teams
ObservaKit gives 1–5 person data teams the 5 core observability pillars — Freshness,
Volume, Quality, Schema Drift, and Pipeline Health — in a single docker-compose up.
No Monte Carlo. No Metaplane. No SaaS bill.
- ✅ Freshness Monitor — detects stale tables by tracking max(updated_at)
- ✅ Volume Anomaly — Z-score detection against 7-day rolling averages
- ✅ Quality Checks — Soda Core & Great Expectations templates, ready to use
- ✅ Schema Drift Detector — snapshots information_schema, diffs on every run
- ✅ Pipeline Health — Airflow/Prefect REST API + OpenTelemetry + Grafana
- ✅ FinOps Tracker — Snowflake credits & BigQuery bytes billed, natively
- ✅ Native dbt Integration — parses run_results.json directly, no extra packages
Stack: Python · FastAPI · SQLAlchemy · Alembic · Prometheus · Grafana · Docker Compose · dbt · Airflow / Prefect
| Repo | Description | Language | Status |
|---|---|---|---|
| 🔭 ObservaKit | Self-hosted data observability & FinOps starter kit | Python | active |
| 🧰 toolscontainer | Multi-purpose Python utility scripts & automations | Python | maintained |
| 🕷️ scrapy-bot | Scrapy + Flask web scraping bot experiment | Python | archived |
| 💻 online-ide | Lightweight online Python execution environment | Python | experimental |
- Data Engineering Python · dbt · Apache Airflow · Prefect · Apache Spark
- Warehouses PostgreSQL · Snowflake · BigQuery · DuckDB
- Observability Prometheus · Grafana · OpenTelemetry · Soda Core
- Backend FastAPI · SQLAlchemy · Alembic · Pydantic
- Infra & DevOps Docker · Docker Compose · Terraform · GitHub Actions
- AI / ML LangChain · OpenAI APIs · Vector DBs (Qdrant / ChromaDB)
"Build what the ecosystem needs. Share what you build. Let the community make it better."
Every project we open-source follows three rules:
- Zero vendor lock-in — runs on infra you own and control
- Quickstart in under 10 minutes — if onboarding is painful, it won't get adopted
- Progressive complexity — adopt one layer at a time; no all-or-nothing commitment
We actively maintain what we ship. Issues get responses. PRs get reviewed. Roadmaps get published.
All our public repos welcome contributions. The best place to start:
- 🔭 ObservaKit →
good first issues - Add a new warehouse connector (Redshift, DuckDB, Delta Lake)
- Write a Grafana dashboard for a new observability use case
- Improve documentation or add a real-world example
Read CONTRIBUTING.md before opening a PR.
We are open to:
- Collaborations on data tooling, AI pipelines, or observability infra
- Consulting engagements — data platform audits, pipeline migrations, cost optimization
- Freelance / contract data engineering work for startups and scaleups
| Channel | Link |
|---|---|
| 🐙 GitHub | @willowvibe |
| 📋 ObservaKit Issues | Open an issue |
| 🔐 Security Reports | See SECURITY.md |
🌿 WillowVibe DataSynapse — Bengaluru, India · Building in the open since 2024 · Star ObservaKit ⭐