A full-stack ETL/ELT pipeline management tool with Django backend and React frontend.
- Authentication: JWT-based authentication with login (no signup by default)
- Data Sources: Manage API sources with various authentication methods
- Streams: Define data streams with pagination and schema inference
- Data Packages: Create and materialize data packages from streams
- Data Models: Support for both Dimensional and Data Vault modeling
- ClickHouse Integration: Create tables and load data from S3 into ClickHouse
- Automatic table creation from model definitions
- S3 data virtualization for efficient loading
- Hash transformations for Data Vault business keys
- Real-time loading progress tracking
- Backend Storage: Configure S3 and ClickHouse storage backends
- Task Queue: Celery-based asynchronous task execution for stream processing
- Scheduled Execution: Celery Beat integration for scheduled stream runs
- Run Tracking: Automatic tracking of stream execution history and status
- Backend: Django + Django REST Framework + PostgreSQL
- Frontend: React + TypeScript + Vite + Mantine UI
- Task Queue: Celery + Redis for asynchronous task execution
- Scheduler: Celery Beat for scheduled stream execution
- Data Layer: TanStack Query (React Query) for caching and state management
- Authentication: JWT tokens via djangorestframework-simplejwt
- API Documentation: Swagger UI via drf-spectacular
- Error Tracking: Sentry for both frontend and backend
For detailed frontend architecture, see frontend/ARCHITECTURE.md.
- Docker and Docker Compose
- Node.js 18+ (for local frontend development)
- Python 3.11+ (for local backend development)
- Clone the repository:
git clone https://github.com/fricker-studios/etl.git
cd etl- Start the services:
docker-compose up -d- Run database migrations:
docker-compose exec api python manage.py migrate- Set up periodic tasks for scheduled streams:
docker-compose exec api python manage.py setup_periodic_tasks- Create a superuser:
docker-compose exec api python manage.py createsuperuser- Access the application:
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000/api
- Admin Panel: http://localhost:8000/admin
- API Documentation: http://localhost:8000/api/docs
- Create and activate a virtual environment:
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
cp .env.example .env
# Edit .env with your configuration- Run migrations:
python manage.py migrate- Create a superuser:
python manage.py createsuperuser- Start the development server:
python manage.py runserver- Install dependencies:
cd frontend
npm install- Set up environment variables:
cp .env.example .env
# Edit .env with your API URL- Start the development server:
npm run devSECRET_KEY=your-secret-key-here
DEBUG=True
DB_NAME=etl
DB_USER=postgres
DB_PASSWORD=postgres
DB_HOST=localhost # or 'db' for Docker
DB_PORT=5432
ALLOWED_HOSTS=*
CORS_ALLOWED_ORIGINS=http://localhost:5173,http://localhost:3000
CELERY_BROKER_URL=redis://localhost:6379/0 # or 'redis://redis:6379/0' for Docker
CELERY_RESULT_BACKEND=redis://localhost:6379/0 # or 'redis://redis:6379/0' for DockerVITE_API_URL=http://localhost:8000/apiPOST /api/auth/login/- Login and get JWT tokensGET /api/auth/me/- Get current user infoGET/POST /api/storage-backends/- Manage storage backendsGET/POST /api/api-sources/- Manage API sourcesGET/POST /api/streams/- Manage streamsGET/POST /api/packages/- Manage data packagesGET/POST /api/models/- Manage data modelsPOST /api/models/{id}/create_table/- Create ClickHouse table for a modelPOST /api/models/{id}/load_data/- Load data from packages into model tableGET /api/models/{id}/loading_progress/- Get data loading progress
etl/
├── backend/
│ ├── authentication/ # JWT authentication
│ ├── core/ # Core models, views, serializers
│ │ ├── models.py # Django models
│ │ ├── views.py # DRF ViewSets
│ │ ├── serializers.py # API serializers
│ │ ├── urls.py # API routes
│ │ ├── encryption.py # Field encryption utilities
│ │ ├── s3_utils.py # S3 integration
│ │ └── scheduler.py # Background task scheduler
│ ├── config/ # Django settings
│ ├── manage.py
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── app/ # App shell and routing
│ │ ├── components/
│ │ │ └── common/ # Reusable UI components
│ │ ├── features/ # Feature-specific components
│ │ ├── hooks/ # React Query custom hooks
│ │ ├── lib/ # Configuration (QueryClient)
│ │ ├── pages/ # Page components
│ │ ├── store/ # Zustand stores (auth)
│ │ └── utils/ # Utilities and API client
│ ├── ARCHITECTURE.md # Frontend architecture docs
│ └── package.json
└── docker-compose.yml
# Run migrations
python manage.py migrate
# Create migrations
python manage.py makemigrations
# Create superuser
python manage.py createsuperuser
# Set up periodic tasks for scheduled streams
python manage.py setup_periodic_tasks
# Execute a stream manually
python manage.py execute_stream <stream_id>
# Run tests
python manage.py test
# Start Celery worker (for local development)
celery -A config worker -l info
# Start Celery Beat scheduler (for local development)
celery -A config beat -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler# Start dev server
npm run dev
# Build for production
npm run build
# Run linter
npm run lint
# Format code
npm run prettier:write- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License.