Skip to content

Latest commit

 

History

History
215 lines (155 loc) · 8.74 KB

File metadata and controls

215 lines (155 loc) · 8.74 KB
Area Code starter repo powered by Moose — Operational Data Warehouse with Streamlit

Area Code - Operational Data Warehouse Starter Application


⚠️ ALPHA RELEASE - This is an early alpha version under active development. We are actively testing on different machines and adding production deployment capabilities. Use at your own risk and expect breaking changes.

Area Code ODW is a production-ready starter repository with all the necessary building blocks for building operational data warehouses with real-time data processing, analytics, and visualization capabilities.

🚀 Quick Start

Get up and running in minutes with our automated setup:

# 1. Ensure Docker Desktop is running

# 2. Install dependencies & start development environment
pnpm odw:dev

# 3. Seed databases with sample data
pnpm odw:dev:seed

# 4. Open the data warehouse frontend
http://localhost:8501/

This will:

  • Install all Python dependencies and create virtual environment
  • Start the data warehouse service (Moose app) on port 4200
  • Start the Streamlit frontend on port 8501
  • Launch Kafdrop UI for message queue monitoring on port 9999
  • Open the dashboard in your browser automatically

🛠️ Available Scripts

# Development
pnpm odw:dev              # Start all services
pnpm odw:dev:clean        # Clean all services

# Individual services
pnpm --filter dw-frontend dev           # Frontend only
pnpm --filter data-warehouse dev        # Data Warehouse only
pnpm --filter kafdrop dev               # Kafdrop only

🏗️ Tech Stack

Category Technologies
Frontend Streamlit, Python 3.12+
Backend Moose, Python, FastAPI
Database ClickHouse (analytical), PostgreSQL (temporal)
Message Queue Redpanda (Kafka-compatible)
Workflow Engine Temporal
Caching Redis
Data Processing Kafka Python, ClickHouse Connect
Infrastructure Docker, Docker Compose
Build Tool Python setuptools, pip

📁 Project Structure

area-code/
├── odw/                      # Operational Data Warehouse
│   ├── services/            # Backend services
│   │   ├── data-warehouse/  # Main Moose data warehouse service
│   │   │   ├── app/
│   │   │   │   ├── apis/    # REST API endpoints for data consumption
│   │   │   │   ├── blobs/   # Blob data extraction workflows
│   │   │   │   ├── events/  # Events data extraction workflows
│   │   │   │   ├── ingest/  # Data models and stream transformations
│   │   │   │   ├── logs/    # Log data extraction workflows
│   │   │   │   └── views/   # Materialized views for analytics
│   │   │   ├── docs/        # Documentation and walkthrough guides
│   │   │   ├── setup.sh     # Setup and management script
│   │   │   ├── moose.config.toml # Moose framework configuration
│   │   │   └── requirements.txt # Python dependencies
│   │   └── connectors/      # External data source connectors
│   │       └── src/
│   │           ├── blob_connector.py    # Blob storage data connector
│   │           ├── events_connector.py  # Events data connector
│   │           ├── logs_connector.py    # Logs data connector
│   │           └── connector_factory.py # Connector factory pattern
│   └── apps/
│       └── dw-frontend/     # Streamlit web application
│           ├── pages/       # Frontend page components
│           ├── utils/       # Frontend utility functions
│           └── main.py      # Streamlit application entry point

🔧 Services Overview

Frontend (dw-frontend)

  • Modern Streamlit web application with real-time data visualization
  • Interactive dashboards for analytics and monitoring
  • Real-time status monitoring and queue management
  • Responsive design with custom styling and tooltips

Data Warehouse Service

Connectors Service

  • Modular data connector system for external data sources
  • Factory pattern for easy connector integration
  • Support for blob storage, events, and logs data sources
  • Extensible architecture for custom data connectors

Infrastructure Services

  • Redpanda: Kafka-compatible message queue for real-time data streaming
  • ClickHouse: High-performance analytical database
  • Temporal: Workflow engine for reliable data processing
  • Redis: Caching and session management
  • Kafdrop: Web UI for monitoring message queues

📊 Data Architecture

The application uses a modern data architecture:

  1. ClickHouse (Analytical Database)
  2. Redpanda (Message Queue)
  3. Temporal (Workflow Engine)
  4. Redis (Caching)

🚀 Production Deployment (Coming Soon)

🚧 UNDER DEVELOPMENT - Production deployment capabilities are actively being developed and tested. The current focus is on stability and compatibility across different machine configurations.

We're working on a production deployment strategy that will be available soon.

🐛 Troubleshooting

🔬 TESTING STATUS - We are actively testing on various machine configurations. Currently tested on Mac M3 Pro (18GB RAM), M4 Pro, and M4 Max. We're expanding testing to more configurations.

Common Issues

  1. Python Version: Ensure you're using Python 3.12+ (required for Moose)
  2. Docker: Make sure Docker Desktop is running before setup
  3. Memory Issues: ClickHouse and Redpanda require significant memory (4GB+ recommended)
  4. Port Conflicts: Ensure ports 4200, 8501, 9999, 18123, 19092 are available

Tested Configurations:

  • ✅ Mac M3 Pro (18GB RAM)
  • ✅ Mac M4 Pro
  • ✅ Mac M4 Max
  • 🔄 More configurations being tested

Reset Environment

# Navigate to data warehouse directory
cd odw/services/data-warehouse

# Clean and restart
./setup.sh reset

# Check status
./setup.sh status

📚 Resources

🦌 Moose Resources

🛠️ Other Technologies

🆘 Support

📋 ALPHA FEEDBACK - We welcome feedback and bug reports during this alpha phase. Please include your machine configuration when reporting issues.

🦌 Moose Support

🏗️ Area Code Demo Support

For issues and questions about this demo:

  1. Check the troubleshooting section above
  2. Open an issue on GitHub with your machine configuration
  3. Join our development discussions for alpha testing feedback

🦌 Built with Moose

This repository demonstrates Moose's capabilities for building production-ready operational data warehouses with:

  • Real-time analytics with ClickHouse
  • Event streaming with Redpanda
  • Reliable workflows with Temporal
  • Interactive dashboards with Streamlit

🚀 Get Started with Moose | 📚 Moose Documentation