Skip to content

fricker-studios/etl

Repository files navigation

ETL Pipeline Studio

Release to Production Release

A full-stack ETL/ELT pipeline management tool with Django backend and React frontend.

Features

  • Authentication: JWT-based authentication with login (no signup by default)
  • Data Sources: Manage API sources with various authentication methods
  • Streams: Define data streams with pagination and schema inference
  • Data Packages: Create and materialize data packages from streams
  • Data Models: Support for both Dimensional and Data Vault modeling
  • ClickHouse Integration: Create tables and load data from S3 into ClickHouse
    • Automatic table creation from model definitions
    • S3 data virtualization for efficient loading
    • Hash transformations for Data Vault business keys
    • Real-time loading progress tracking
  • Backend Storage: Configure S3 and ClickHouse storage backends
  • Task Queue: Celery-based asynchronous task execution for stream processing
  • Scheduled Execution: Celery Beat integration for scheduled stream runs
  • Run Tracking: Automatic tracking of stream execution history and status

Architecture

  • Backend: Django + Django REST Framework + PostgreSQL
  • Frontend: React + TypeScript + Vite + Mantine UI
  • Task Queue: Celery + Redis for asynchronous task execution
  • Scheduler: Celery Beat for scheduled stream execution
  • Data Layer: TanStack Query (React Query) for caching and state management
  • Authentication: JWT tokens via djangorestframework-simplejwt
  • API Documentation: Swagger UI via drf-spectacular
  • Error Tracking: Sentry for both frontend and backend

For detailed frontend architecture, see frontend/ARCHITECTURE.md.

Quick Start

Prerequisites

  • Docker and Docker Compose
  • Node.js 18+ (for local frontend development)
  • Python 3.11+ (for local backend development)

Running with Docker

  1. Clone the repository:
git clone https://github.com/fricker-studios/etl.git
cd etl
  1. Start the services:
docker-compose up -d
  1. Run database migrations:
docker-compose exec api python manage.py migrate
  1. Set up periodic tasks for scheduled streams:
docker-compose exec api python manage.py setup_periodic_tasks
  1. Create a superuser:
docker-compose exec api python manage.py createsuperuser
  1. Access the application:

Local Development

Backend Setup

  1. Create and activate a virtual environment:
cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables:
cp .env.example .env
# Edit .env with your configuration
  1. Run migrations:
python manage.py migrate
  1. Create a superuser:
python manage.py createsuperuser
  1. Start the development server:
python manage.py runserver

Frontend Setup

  1. Install dependencies:
cd frontend
npm install
  1. Set up environment variables:
cp .env.example .env
# Edit .env with your API URL
  1. Start the development server:
npm run dev

Environment Variables

Backend (.env)

SECRET_KEY=your-secret-key-here
DEBUG=True
DB_NAME=etl
DB_USER=postgres
DB_PASSWORD=postgres
DB_HOST=localhost  # or 'db' for Docker
DB_PORT=5432
ALLOWED_HOSTS=*
CORS_ALLOWED_ORIGINS=http://localhost:5173,http://localhost:3000
CELERY_BROKER_URL=redis://localhost:6379/0  # or 'redis://redis:6379/0' for Docker
CELERY_RESULT_BACKEND=redis://localhost:6379/0  # or 'redis://redis:6379/0' for Docker

Frontend (.env)

VITE_API_URL=http://localhost:8000/api

API Endpoints

  • POST /api/auth/login/ - Login and get JWT tokens
  • GET /api/auth/me/ - Get current user info
  • GET/POST /api/storage-backends/ - Manage storage backends
  • GET/POST /api/api-sources/ - Manage API sources
  • GET/POST /api/streams/ - Manage streams
  • GET/POST /api/packages/ - Manage data packages
  • GET/POST /api/models/ - Manage data models
  • POST /api/models/{id}/create_table/ - Create ClickHouse table for a model
  • POST /api/models/{id}/load_data/ - Load data from packages into model table
  • GET /api/models/{id}/loading_progress/ - Get data loading progress

Project Structure

etl/
├── backend/
│   ├── authentication/         # JWT authentication
│   ├── core/                  # Core models, views, serializers
│   │   ├── models.py          # Django models
│   │   ├── views.py           # DRF ViewSets
│   │   ├── serializers.py     # API serializers
│   │   ├── urls.py            # API routes
│   │   ├── encryption.py      # Field encryption utilities
│   │   ├── s3_utils.py        # S3 integration
│   │   └── scheduler.py       # Background task scheduler
│   ├── config/                # Django settings
│   ├── manage.py
│   └── requirements.txt
├── frontend/
│   ├── src/
│   │   ├── app/              # App shell and routing
│   │   ├── components/
│   │   │   └── common/       # Reusable UI components
│   │   ├── features/         # Feature-specific components
│   │   ├── hooks/            # React Query custom hooks
│   │   ├── lib/              # Configuration (QueryClient)
│   │   ├── pages/            # Page components
│   │   ├── store/            # Zustand stores (auth)
│   │   └── utils/            # Utilities and API client
│   ├── ARCHITECTURE.md       # Frontend architecture docs
│   └── package.json
└── docker-compose.yml

Development Commands

Backend

# Run migrations
python manage.py migrate

# Create migrations
python manage.py makemigrations

# Create superuser
python manage.py createsuperuser

# Set up periodic tasks for scheduled streams
python manage.py setup_periodic_tasks

# Execute a stream manually
python manage.py execute_stream <stream_id>

# Run tests
python manage.py test

# Start Celery worker (for local development)
celery -A config worker -l info

# Start Celery Beat scheduler (for local development)
celery -A config beat -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler

Frontend

# Start dev server
npm run dev

# Build for production
npm run build

# Run linter
npm run lint

# Format code
npm run prettier:write

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors