Production-ready ML systems demonstrating complete MLOps pipelines on Google Cloud Platform using Kubernetes, featuring both computer vision and tabular ML use cases.
- High-Performance Inference: NVIDIA Triton with ONNX-optimized YOLOv8 on CPU
- Auto-Scaling Infrastructure: GKE cluster provisioned via Terraform
- Monitoring: Prometheus + Grafana dashboards
- Data Drift Detection: Evidently AI integration for vision models
- Complete ML Lifecycle: Data ingestion β Training β Deployment β Monitoring β Retraining
- Workflow Orchestration: Apache Airflow for data pipelines and retraining automation
- Experiment Tracking: MLFlow with PostgreSQL backend and GCS artifact storage
- Model Serving: FastAPI service with automatic model loading from MLFlow registry
- Drift Monitoring: Evidently AI for data and concept drift detection
- Automated Retraining: Triggered by drift thresholds with model comparison
- Event Streaming: Apache Kafka for real-time event processing and email notifications
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ML Pipeline Architecture β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Airflow βββββββΆβ MLFlow βββββββΆβ ML Service β
β Orchestrationβ βModel Registryβ β (FastAPI) β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β β
β Triggers β Loads Model β Serves
βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Training β β Production β β Predictions β
β Pipeline β β Model v1 β β + Drift β
ββββββββββββββββ ββββββββββββββββ ββββββββ¬ββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββ€
β β
βΌ βΌ
ββββββββββββββββ ββββββββββββββββ
β Evidently βββββββ Drift Reports ββββββββ Kafka β
β UI β β Events β
ββββββββββββββββ ββββββββββββββββ
β β
ββββββββββββββββββββ¬ββββββββββββββββββββββββββ
βΌ
ββββββββββββββββ
β Grafana β
β Monitoring β
ββββββββββββββββ
Prerequisites: GKE cluster deployed (see Setup below)
# Train model and activate ML Service
./scripts/train-iris-model.sh
# Test predictions
kubectl port-forward svc/ml-service 8082:8080 &
curl -X POST http://localhost:8082/predict \
-H "Content-Type: application/json" \
-d '{"sepal_length":5.1,"sepal_width":3.5,"petal_length":1.4,"petal_width":0.2}'Result: Model trained β Registered in MLFlow β Deployed to Production β Serving predictions
View in MLFlow UI: kubectl port-forward svc/mlflow 5000:5000 β http://localhost:5000
Advanced Options:
- Local training: ml-pipeline/README.md
- Airflow orchestration: PIPELINE_ACTIVATION_GUIDE.md
- Troubleshooting: TROUBLESHOOTING.md
Quick Access: make port-forward-all then visit the URLs below. See DASHBOARD_QUICK_START.md for details.
| Dashboard | Port | Login | Purpose |
|---|---|---|---|
| Grafana | 3000 | admin / make get-grafana-password |
Operations monitoring, system metrics, alerts |
| MLFlow | 5000 | - | Experiment tracking, model registry, artifacts |
| Airflow | 8081 | admin / admin | DAG orchestration, pipeline monitoring |
| Evidently | 8001 | - | Drift detection, data quality reports |
| Prometheus | 9090 | - | Raw metrics, PromQL queries |
Access: kubectl port-forward svc/<service-name> <port>:<target-port> or use make port-forward-all
βββ backend/ # FastAPI app (YOLOv8 pre/post-processing)
βββ ml-service/ # FastAPI app (Iris classification serving)
βββ ml-pipeline/ # Complete tabular ML pipeline
β βββ data/ # Data generation and GCS utilities
β βββ training/ # Training scripts with MLFlow
β βββ deployment/ # Model promotion and deployment
β βββ monitoring/ # Drift analysis and monitoring
βββ infrastructure/
β βββ gcp/ # Terraform (GKE, GCS, Artifact Registry)
β βββ kubernetes/ # Helm charts
β βββ backend/ # YOLOv8 backend
β βββ triton/ # Triton Inference Server
β βββ ml-service/ # Iris ML service
β βββ mlflow/ # MLFlow with PostgreSQL
β βββ airflow/ # Airflow with DAGs
β βββ kafka/ # Kafka + Zookeeper
β βββ event-consumer/ # Email notification consumer
β βββ monitoring/ # Prometheus + Grafana
β βββ evidently-ui/ # Evidently UI
βββ event-consumer/ # Kafka consumer for email notifications
βββ models/ # YOLO model and conversion scripts
βββ tests/
β βββ performance/ # Load testing with Locust
β βββ data_drift/ # YOLOv8 drift detection
β βββ ml-service/ # ML service integration tests
βββ .github/workflows/ # CI/CD automation
This project includes two complete MLOps pipelines:
- Computer Vision Pipeline (YOLOv8 with Triton) - See sections below
- Tabular ML Pipeline (Iris with MLFlow + Airflow) - See ml-pipeline/README.md
- GCP account with billing enabled
gcloud,terraform,kubectl,helminstalled- Python 3.10+
- Docker
# Enable APIs
gcloud services enable container.googleapis.com \
artifactregistry.googleapis.com \
storage-component.googleapis.com
# Create service account with roles:
# - Kubernetes Engine Admin
# - Storage Admin
# - Artifact Registry Admin
# Download JSON key as service-account-key.jsoncp env.example env
# Edit env with your GCP project ID, region, bucket names:
# - gcs_bucket_name: For YOLO models
# - gcs_ml_data_bucket_name: For tabular ML data
source envcd infrastructure/gcp
terraform init
terraform apply
cd ../..
# Configure kubectl
gcloud container clusters get-credentials $GKE_CLUSTER_NAME \
--zone=$GCP_ZONE --project=$GCP_PROJECT_IDcd models
pip install -r requirements.txt
# Place your model.pt in models/yolov8n/1/
# Convert to ONNX
python convert_to_onnx.py
# Upload to GCS
gsutil -m cp -r yolov8n gs://$GCS_BUCKET_NAME/
cd ..# Create GCS secret
kubectl create secret generic gcs-sa-key \
--from-file=gcp-credentials.json=service-account-key.json
# Deploy Triton
helm upgrade --install triton ./infrastructure/kubernetes/triton \
--set modelRepository.gcsBucket=$GCS_BUCKET_NAME \
--set gcsAuthSecret=gcs-sa-key
# Build & deploy backend
export IMAGE_TAG="latest"
export IMAGE_NAME="$GCP_REGION-docker.pkg.dev/$GCP_PROJECT_ID/$ARTIFACT_REGISTRY_REPO/$BACKEND_IMAGE_NAME:$IMAGE_TAG"
gcloud auth configure-docker $GCP_REGION-docker.pkg.dev
docker build -t $IMAGE_NAME ./backend
docker push $IMAGE_NAME
helm upgrade --install backend ./infrastructure/kubernetes/backend \
--set image.repository=$GCP_REGION-docker.pkg.dev/$GCP_PROJECT_ID/$ARTIFACT_REGISTRY_REPO/$BACKEND_IMAGE_NAME \
--set image.tag=$IMAGE_TAG \
--set tritonReleaseName=triton# Deploy MLFlow
helm upgrade --install mlflow ./infrastructure/kubernetes/mlflow \
--set gcs.bucketName=$GCS_ML_DATA_BUCKET_NAME
# Deploy Airflow
helm upgrade --install airflow ./infrastructure/kubernetes/airflow \
--set gcs.bucketName=$GCS_ML_DATA_BUCKET_NAME \
--set mlflow.trackingUri=http://mlflow:5000
# Build & deploy ML service
export ML_SERVICE_IMAGE="$GCP_REGION-docker.pkg.dev/$GCP_PROJECT_ID/$ARTIFACT_REGISTRY_REPO/ml-service:$IMAGE_TAG"
docker build -t $ML_SERVICE_IMAGE ./ml-service
docker push $ML_SERVICE_IMAGE
helm upgrade --install ml-service ./infrastructure/kubernetes/ml-service \
--set image.repository=$GCP_REGION-docker.pkg.dev/$GCP_PROJECT_ID/$ARTIFACT_REGISTRY_REPO/ml-service \
--set image.tag=$IMAGE_TAG \
--set mlflow.trackingUri=http://mlflow:5000# Deploy Prometheus + Grafana
cd infrastructure/kubernetes/monitoring
helm dependency build
cd ../../..
helm upgrade --install monitoring ./infrastructure/kubernetes/monitoring \
--namespace monitoring --create-namespace
# Deploy Evidently UI
helm upgrade --install evidently-ui ./infrastructure/kubernetes/evidently-ui \
--set ui.demoProjects=false \
--set persistence.enabled=true# YOLOv8 Backend API
kubectl port-forward svc/backend 8080:8080
curl -X POST -F "file=@test.jpg" http://localhost:8080/invocations
# ML Service (Iris)
kubectl port-forward svc/ml-service 8082:8080
curl http://localhost:8082/docs # OpenAPI docs
# MLFlow UI
kubectl port-forward svc/mlflow 5000:5000
# Access at http://localhost:5000
# Airflow UI (default: admin/admin)
kubectl port-forward svc/airflow-webserver 8081:8080
# Access at http://localhost:8081
# Grafana (default: admin/prom-operator)
kubectl get secret --namespace monitoring monitoring-grafana \
-o jsonpath="{.data.admin-password}" | base64 --decode
kubectl port-forward --namespace monitoring svc/monitoring-grafana 3000:80
# Evidently UI
kubectl port-forward svc/evidently-ui 8001:8000cd tests/performance
pip install -r requirements.txt
# Set test image (optional)
export TEST_IMAGE_PATH=/path/to/test.jpg
# Run load test
locust -f test_load.py --host http://localhost:8000 \
--users 5 --spawn-rate 2 --run-time 30s --headlesscd tests/data_drift
pip install -r requirements.txt
# Set test image (optional)
export TEST_IMAGE_PATH=/path/to/test.jpg
export BACKEND_URL=http://localhost:8000
python test_yolo_drift_real.pycd tests/ml-service
pip install -r requirements.txt
# Port-forward ML service first
kubectl port-forward svc/ml-service 8082:8080
# Run tests
pytest test_predictions.py -v
pytest test_drift_detection.py -vGitHub Actions automatically builds and deploys on push to main. Configure these secrets:
GCP_PROJECT_ID,GCP_REGION,GCP_ZONEGKE_CLUSTER_NAMEGCS_BUCKET_NAME(for YOLOv8 models)GCS_ML_DATA_BUCKET_NAME(for tabular ML data)ARTIFACT_REGISTRY_REPO,BACKEND_IMAGE_NAMEGCP_KEY_FILE(entire service account JSON)SMTP_USERNAME,SMTP_PASSWORD,EMAIL_TO,EMAIL_FROM(for Kafka email notifications)
See .github/SETUP_SECRETS.md for detailed setup instructions.
The CI/CD pipeline automatically deploys:
- Triton Inference Server (YOLOv8)
- Backend (YOLOv8 FastAPI service)
- MLFlow (Experiment tracking)
- Airflow (Workflow orchestration)
- ML Service (Iris classification service)
- Kafka (Event streaming)
- Event Consumer (Email notifications)
- Evidently UI (Drift visualization)
- Monitoring (Prometheus + Grafana)
Prometheus Metrics: Backend exposes /metrics with:
- Request count, duration, errors
- Inference latency
- Triton connection errors
Grafana Dashboards: Pre-configured dashboard in infrastructure/kubernetes/monitoring/mlops-dashboard.json
Evidently Data Drift:
- Backend auto-collects prediction features
- Reports generated every 50 predictions
- View in Evidently UI
Kafka powers real-time events for predictions, drift alerts, DAG status, and training notifications.
To deploy, test, and extend the Kafka stack, follow:
- Full architecture:
infrastructure/kubernetes/kafka/KAFKA_ARCHITECTURE.md - Quick start & testing:
infrastructure/kubernetes/kafka/QUICK_START.md - Email / consumer configuration:
event-consumer/chart README
TRITON_URL: Triton server endpoint (default:triton:8000)EVIDENTLY_WORKSPACE: Workspace path (default:/workspace)EVIDENTLY_PROJECT_ID: Project ID for drift reportsEVIDENTLY_BATCH_SIZE: Report frequency (default: 50)EVIDENTLY_MAX_SAMPLES: Max samples in memory (default: 1000)
Edit models/yolov8n/config.pbtxt to customize:
- Instance count (parallelism)
- Max batch size
- Dynamic batching settings
See TROUBLESHOOTING.md for comprehensive debugging guide.
Common Issues:
- Service not running:
make check-dashboardsβ check pod logs - Port conflicts:
make stop-port-forwardthen restart - Auth issues:
gcloud container clusters get-credentials - Model not loading: Verify MLFlow connectivity and model registry
- ml-pipeline/README.md - Complete ML pipeline guide
- DASHBOARD_QUICK_START.md - Dashboard access & usage
- PIPELINE_ACTIVATION_GUIDE.md - Airflow setup
- infrastructure/kubernetes/kafka/KAFKA_ARCHITECTURE.md - Event streaming
- CPU-optimized: Uses ONNX Runtime on CPU nodes (e2-standard-4)
- Two GCS Buckets: Separate buckets for YOLO models and ML data
- Terraform state: Should be stored remotely (S3, GCS) for production
- Test images: Set
TEST_IMAGE_PATHenv var or tests will use dummy images - Backend Port: Backend service runs on port 8080 internally (use
kubectl port-forward svc/backend 8080:8080) - ML Service requires model: Train and register an Iris model in MLFlow first (see ml-pipeline/README.md)
- GPU support: Not configured by default, see
models/config.pbtxtand Terraform to enable - Airflow DAGs: Located in
infrastructure/kubernetes/airflow/dags/ - Model Versioning: All models tracked in MLFlow registry with staging/production stages
MIT