Curated by Fourth Industrial Systems (4th.is), this guide highlights open‑source tools and patterns for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, and Analytics designed to run natively on Kubernetes and Docker. We work across languages — Python, R, Scala, Java, C#, Go, Julia, C++ — with practical emphasis on Kubeflow, Seldon Core, Pachyderm, Banzai Pipeline, H2O, TensorFlow, CNTK, XGBoost, MXNet, PyTorch, ONNX, Argo, Airflow, Apache Beam, Apache Spark, Intel BigDL, Rook, and Ambassador.
“The wind and the waves are always on the side of the ablest navigator.” — Edmund Gibbon
Across industry, Kubernetes has become the standard for orchestrating distributed systems — whether on‑prem, in a single cloud, or spanning many. In contrast, many ML and data workflows still begin on laptops or ad‑hoc notebook servers. This repository shows how to elevate those experiments into reliable, scalable, reproducible, and portable Kubernetes deployments.
- Elastic scale for CPU/GPU resources with automated orchestration.
- Portability: the same workloads run across all major clouds and on‑prem.
- Ecosystem momentum: see wide adoption in the CNCF membership.
- Self‑healing: immutable containers plus controllers enable resilient apps.
- Containerization: package apps as small, efficient images for rapid scale‑out.
- Immutability: predictable rollouts, easy rollbacks, and reproducibility.
- Persistent storage: plan PVCs/CSI drivers early; projects like Rook are popular.
- Service connectivity: ephemeral pods change the way services talk; a service mesh helps (see http://layer5.io/service-meshes/).
We focus on AI/ML/Data Science OSS that thrives in infinitely scalable Kubernetes environments. For a broader view of orchestration and operations, see Awesome Machine Learning Operations.
You may also want domain‑specific “awesome” lists:
- awesome-kubernetes
- Awesome Helm
- Awesome Operators
- Awesome Docker
- container-security-awesome
- Awesome Linux Containers
- awesome-AIOps
- Awesome Julia
- Awesome R
- Awesome Bioinformatics
- Awesome Recurrent Neural Networks
- Awesome Reinforcement Learning
- Awesome Artificial Intelligence
- Awesome Machine Learning
- Awesome StarCraft AI
- Awesome Quantum Machine Learning
- Awesome AI
- Awesome Feature Engineering
- Awesome WindowsML ONNX Models
- Awesome-ONNX-Models
- Awesome TensorFlow
- Awesome Blockchain AI
- Awesome Deep Learning
- Awesome Deep Learning Resources
- awesome-nlp
- Awesome-Pytorch-list
- awesome-ai-services
- awesome_list_ai_bot_programming
- ML & DL Tutorials
- ML on Source Code
- ML Interpretability
- Interpretable ML
- AutoML Papers
- Awesome H2O
- Awesome MXNet
- Awesome Bots
- Awesome ChatOps
- Awesome Apache Airflow
- Awesome Big Data
- Awesome-BigData
- awesome-datasets
- Awesome Analytics
- Awesome Data Science
- Awesome Pipeline
- Awesome Nextflow
- awesome-etl
- Awesome Business Intelligence
Open‑source projects are maintained by people and teams of all sizes. If you find value here, please star upstream repos, file issues/PRs, and thank maintainers. If something’s missing, open a discussion and we’ll add it.
Kubernetes translates roughly to “helmsman” and draws design lineage from Google’s Borg. The internal codename Project Seven nods to Seven of Nine from Star Trek; the logo’s seven spokes reference that origin. More background: https://en.wikipedia.org/wiki/Kubernetes.
“The duties of the ruler are like those of the helmsman of a great ship…” — Han Fei
“If you want to build a ship… teach them to yearn for the vast and endless sea.” — Antoine de Saint‑Exupéry
http://kubeflow.org/ — Cloud‑native ML platform.
-
Training: TFJob controller for CPU/GPU scaling (tf-operator).
-
Serving: TensorFlow Serving and integrations with Seldon Core.
-
Multi‑framework: operators for PyTorch, MXNet, Chainer, with ingress via Ambassador and pipelines with Pachyderm.
-
Extras: Kubeflow Labs (Azure), H2O + Kubeflow.
https://www.seldon.io/ — Kubernetes‑native model serving: https://github.com/SeldonIO/seldon-core.
http://pachyderm.io/ — Versioned data pipelines for production ML: https://github.com/pachyderm/pachyderm.
Multi‑framework deep learning on Kubernetes (TensorFlow, Caffe, PyTorch).
Docs: https://developer.ibm.com/patterns/deploy-and-use-a-multi-framework-deep-learning-platform-on-kubernetes/
Code: https://github.com/IBM/FfDL
Platform for building, training, and monitoring large‑scale DL apps.
https://polyaxon.com/ • https://github.com/polyaxon/polyaxon
Big Data Science on Kubernetes.
https://github.com/datalayer/datalayer • https://datalayer.io • https://docs.datalayer.io
Accelerators for building ML containers and K8s objects.
https://github.com/IntelAI/mlt
“Impossible is a word humans use far too often.” — Seven of Nine
Real‑time enterprise AI platform with K8s quickstart:
https://github.com/PipelineAI/pipeline • https://pipeline.ai
- Dask scales Python for analytics: https://github.com/dask/dask • http://dask.pydata.org/en/latest/
Examples: https://github.com/dask/dask-examples • Tutorial: https://github.com/dask/dask-tutorial - Dask‑Kubernetes: https://github.com/dask/dask-kubernetes • Docs: https://dask-kubernetes.readthedocs.io/en/latest/
Helm: https://github.com/dask/helm-chart • Docker: https://github.com/dask/dask-docker - Dask‑ML: https://github.com/dask/dask-ml • http://ml.dask.org/
- Dask‑XGBoost: https://github.com/dask/dask-xgboost
https://github.com/Landoop/kafka-helm-charts • Connectors: https://github.com/Landoop/stream-reactor
End‑to‑end sample stack (K8s, Spark/Flink/Beam, Kafka, etc.):
https://github.com/Chabane/bigdata-playground
From commit to scale on Kubernetes (CI/CD, logging, monitoring, autoscaling):
https://github.com/banzaicloud/pipeline
Container‑native workflows; cloud‑agnostic; runs on any Kubernetes cluster:
https://argoproj.github.io/ • https://github.com/argoproj/argo
Events: https://github.com/argoproj/argo-events
Author, schedule, and monitor DAGs for ETL/ML: https://airflow.apache.org/
Best practices: https://gtoonstra.github.io/etl-with-airflow/
K8s tools: https://github.com/mumoshu/kube-airflow • Operator: https://github.com/GoogleCloudPlatform/airflow-operator
- Beam Operator: https://github.com/aleksdjuricin/beam-operator
- Cron‑scheduled Beam Jobs: https://github.com/sanderploegsma/beam-scheduling-kubernetes
- Google Cloud Dataflow Templates: https://github.com/GoogleCloudPlatform/DataflowTemplates
Cloud‑native storage orchestration: https://rook.io/ • https://github.com/rook/rook
Container‑attached block storage (Go), with SLAs, tiering, and multi‑AZ replica policies:
https://www.openebs.io/ • https://github.com/openebs/openebs
Maya orchestration: https://github.com/openebs/maya • Helm: https://github.com/openebs/charts
“Only those who brave its dangers comprehend its mystery.” — Longfellow
T.S. Eliot, The Waste Land (for perspective).
Note: Native K8s support arrived in Spark 2.3 and has matured since, but always check your target version’s capabilities.
- Spark Operator: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
- Spark on PKS (multi‑cloud): https://github.com/SnappyDataInc/spark-on-k8s
- Sparknetes: https://github.com/hypnosapos/sparknetes
- HDFS on K8s (Helm charts): https://github.com/apache-spark-on-k8s/kubernetes-HDFS
- Stable Helm chart (Spark): https://github.com/helm/charts/tree/master/stable/spark
- Helm chart (Spark Operator): https://github.com/helm/charts/tree/master/incubator/sparkoperator
- Kubernetes examples (Spark): https://github.com/kubernetes/examples/tree/master/staging/spark (may be out of date)
- BigDL: https://bigdl-project.github.io/ • https://github.com/intel-analytics/BigDL
- Analytics Zoo: https://analytics-zoo.github.io/ • https://github.com/intel-analytics/analytics-zoo
- Rad Analytics Spark Operator: https://github.com/radanalyticsio/spark-operator
- OpenShift Spark Images: https://github.com/radanalyticsio/openshift-spark
- SparkPi (Vert.x) tutorial: https://github.com/radanalyticsio/tutorial-sparkpi-java-vertx
- EFK (Elastic/Fluentd/Kibana) Helm: https://github.com/cdwv/efk-stack-helm
- Draft (Azure): https://draft.sh/ • https://github.com/Azure/draft
Pack repo plugin: https://github.com/draftcreate/draft-pack-repo - Brigade (event‑driven pipelines): https://brigade.sh/ • https://github.com/Azure/brigade
- Dashboard: https://github.com/Azure/kashti
- Terminal UI: https://github.com/slok/brigadeterm
- Prometheus exporter: https://github.com/slok/brigade-exporter
- Gateways: BitBucket, GitLab, K8s events, Event Grid, Cron, Trello
- Build your own gateway: https://github.com/technosophos/draft-brigade
- ksonnet (historical): https://ksonnet.io/
The podder‑ai ecosystem offers related components:
- Kubeb (CLI for building/deploying to K8s): https://github.com/podder-ai/kubeb
- pipeline‑framework (Airflow‑based scheduling/monitoring): https://github.com/podder-ai/pipeline-framework
- pipeline‑generator and sample repos:
https://github.com/podder-ai/pipeline-generator •
https://github.com/podder-ai/pipeline-framework-sample •
https://github.com/podder-ai/poc-base-sample •
https://github.com/podder-ai/poc-base
Fourth Industrial Systems builds scalable, ethical AI solutions and agentic workflows that move seamlessly from prototype to production on Kubernetes.
Contact: freeman@4th.is • Learn: learn.4th.is • News: news.4th.is
Trademarks: Kubernetes®, Apache®, NVIDIA®, and other names are the property of their respective owners; references are for identification only and imply no endorsement.