diff --git a/README.md b/README.md index 73087a8..5661ff7 100644 --- a/README.md +++ b/README.md @@ -1,699 +1,511 @@ -# Awesome MLOps [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome) [![Made With Love](https://img.shields.io/badge/Made%20With-Love-orange.svg)](https://github.com/chetanraj/awesome-github-badges) - -![MLOps. You Desing It. Your Train It. You Run It.](awesome-mlops-intro.png) - -*An awesome list of references for MLOps - Machine Learning Operations :point_right: [ml-ops.org](https://ml-ops.org/)* - -[![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/B0B416E7UI) - - -[Linkedin Dr. Larysa Visengeriyeva](https://www.linkedin.com/in/larysavisenger/) - - - - -# Table of Contents -| | | -| -------------------------------- | -------------------------------- | -| [MLOps Core](#core-mlops) | [MLOps Communities](#mlops-communities) | -| [MLOps Books](#mlops-books) | [MLOps Articles](#mlops-articles) | -| [MLOps Workflow Management](#wfl-management)| [MLOps: Feature Stores](#feature-stores) | -|[MLOps: Data Engineering (DataOps)](#dataops) | [MLOps: Model Deployment and Serving](#deployment) | -| [MLOps: Testing, Monitoring and Maintenance](#testing-monintoring)| [MLOps: Infrastructure](#mlops-infra)| -|[MLOps Papers](#mlops-papers) | [Talks About MLOps](#talks-about-mlops) | -| [Existing ML Systems](#existing-ml-systems) | [Machine Learning](#machine-learning)| -| [Software Engineering](#software-engineering) | [Product Management for ML/AI](#product-management-for-mlai) | -| [The Economics of ML/AI](#the-economics-of-mlai) | [Model Governance, Ethics, Responsible AI](#ml-governance) | -| [MLOps: People & Processes](#teams)|[Newsletters About MLOps, Machine Learning, Data Science and Co.](#newsletters)| - - - -# MLOps Core -
-Click to expand! - -1. [Machine Learning Operations: You Design It, You Train It, You Run It!](https://ml-ops.org/) -1. [MLOps SIG Specification](https://github.com/tdcox/mlops-roadmap/blob/master/MLOpsRoadmap2020.md) -1. [ML in Production](http://mlinproduction.com/) -1. [Awesome production machine learning: State of MLOps Tools and Frameworks](https://github.com/EthicalML/awesome-production-machine-learning) -1. [Udemy “Deployment of ML Models”](https://www.udemy.com/course/deployment-of-machine-learning-models/) -1. [Full Stack Deep Learning](https://course.fullstackdeeplearning.com/) -1. [Engineering best practices for Machine Learning](https://se-ml.github.io/practices/) -1. [:rocket: Putting ML in Production](https://madewithml.com/courses/putting-ml-in-production/) -1. [Stanford MLSys Seminar Series](https://mlsys.stanford.edu/) -1. [IBM ML Operationalization Starter Kit](https://github.com/ibm-cloud-architecture/refarch-ml-ops) -1. [Productize ML. A self-study guide for Developers and Product Managers building Machine Learning products.](https://productizeml.gitbook.io/productize-ml/) -1. [MLOps (Machine Learning Operations) Fundamentals on GCP](https://www.coursera.org/learn/mlops-fundamentals) -1. [ML full Stack preparation](https://www.confetti.ai/) -1. [MLOps Guide: Theory and Implementation](https://mlops-guide.github.io/) -1. [Practitioners guide to MLOps: A framework for continuous delivery and automation of machine learning.](https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf) -1. [MLOps maturity assessment](https://github.com/marvelousmlops/mlops_maturity_assessment) -
- - - -# MLOps Communities -
-Click to expand! - -1. [MLOps.community](https://mlops.community/) -1. [CDF Special Interest Group - MLOps](https://github.com/cdfoundation/sig-mlops) -1. [RsqrdAI - Robust and Responsible AI](https://www.rsqrdai.org) -1. [DataTalks.Club](https://datatalks.club/) -1. [Synthetic Data Community](https://syntheticdata.community/) -1. [MLOps World Community](https://www.mlopsworld.com) -1. [Marvelous MLOps](https://www.linkedin.com/company/marvelous-mlops) -
- - -# MLOps Courses - -1. [MLOps Zoomcamp (free)](https://github.com/DataTalksClub/mlops-zoomcamp) -1. [Coursera's Machine Learning Engineering for Production (MLOps) Specialization](https://www.coursera.org/specializations/machine-learning-engineering-for-production-mlops) -1. [Udacity Machine Learning DevOps Engineer](https://www.udacity.com/course/machine-learning-dev-ops-engineer-nanodegree--nd0821) -1. [Made with ML](https://madewithml.com/#course) -1. [Udacity LLMOps: Building Real-World Applications With Large Language Models](https://www.udacity.com/course/building-real-world-applications-with-large-language-models--cd13455) - - - -# MLOps Books - -
-Click to expand! - -1. [“Machine Learning Engineering” by Andriy Burkov, 2020](http://www.mlebook.com/wiki/doku.php?id=start) -1. ["ML Ops: Operationalizing Data Science" by David Sweenor, Steven Hillion, Dan Rope, Dev Kannabiran, Thomas Hill, Michael O'Connell](https://learning.oreilly.com/library/view/ml-ops-operationalizing/9781492074663/) -1. ["Building Machine Learning Powered Applications" by Emmanuel Ameisen](https://learning.oreilly.com/library/view/building-machine-learning/9781492045106/) -1. ["Building Machine Learning Pipelines" by Hannes Hapke, Catherine Nelson, 2020, O’Reilly](https://learning.oreilly.com/library/view/building-machine-learning/9781492053187/) -1. ["Managing Data Science" by Kirill Dubovikov](https://www.packtpub.com/eu/data/managing-data-science) -1. ["Accelerated DevOps with AI, ML & RPA: Non-Programmer's Guide to AIOPS & MLOPS" by Stephen Fleming](https://www.amazon.com/Accelerated-DevOps-AI-RPA-Non-Programmers-ebook/dp/B07ZMJCJRS) -1. ["Evaluating Machine Learning Models" by Alice Zheng](https://learning.oreilly.com/library/view/evaluating-machine-learning/9781492048756/) -1. [Agile AI. 2020. By Carlo Appugliese, Paco Nathan, William S. Roberts. O'Reilly Media, Inc.](https://learning.oreilly.com/library/view/agile-ai/9781492074984/) -1. ["Machine Learning Logistics". 2017. By T. Dunning et al. O'Reilly Media Inc.](https://mapr.com/ebook/machine-learning-logistics/) -1. ["Machine Learning Design Patterns" by Valliappa Lakshmanan, Sara Robinson, Michael Munn. O'Reilly 2020](https://learning.oreilly.com/library/view/machine-learning-design/9781098115777/) -1. ["Serving Machine Learning Models: A Guide to Architecture, Stream Processing Engines, and Frameworks" by Boris Lublinsky, O'Reilly Media, Inc. 2017](https://www.lightbend.com/ebooks/machine-learning-guide-architecture-stream-processing-frameworks-oreilly) -1. ["Kubeflow for Machine Learning" by Holden Karau, Trevor Grant, Ilan Filonenko, Richard Liu, Boris Lublinsky](https://learning.oreilly.com/library/view/kubeflow-for-machine/9781492050117/) -1. ["Clean Machine Learning Code" by Moussa Taifi. Leanpub. 2020](https://leanpub.com/cleanmachinelearningcode) -1. [E-Book "Practical MLOps. How to Get Ready for Production Models"](https://valohai.com/mlops-ebook/) -1. ["Introducing MLOps" by Mark Treveil, et al. O'Reilly Media, Inc. 2020](https://learning.oreilly.com/library/view/introducing-mlops/9781492083283/) -1. ["Machine Learning for Data Streams with Practical Examples in MOA", Bifet, Albert and Gavald\`a, Ricard and Holmes, Geoff and Pfahringer, Bernhard, MIT Press, 2018](https://moa.cms.waikato.ac.nz/book/) -1. ["Machine Learning Product Manual" by Laszlo Sragner, Chris Kelly](https://machinelearningproductmanual.com/) -1. ["Data Science Bootstrap Notes" by Eric J. Ma](https://ericmjl.github.io/data-science-bootstrap-notes/) -1. ["Data Teams" by Jesse Anderson, 2020](https://www.datateams.io/) -1. ["Data Science on AWS" by Chris Fregly, Antje Barth, 2021](https://learning.oreilly.com/library/view/data-science-on/9781492079385/) -1. [“Engineering MLOps” by Emmanuel Raj, 2021](https://www.packtpub.com/product/engineering-mlops/9781800562882) -1. [Machine Learning Engineering in Action](https://www.manning.com/books/machine-learning-engineering-in-action) -1. [Practical MLOps](https://learning.oreilly.com/library/view/practical-mlops/9781098103002/) -1. ["Effective Data Science Infrastructure" by Ville Tuulos, 2021](https://www.manning.com/books/effective-data-science-infrastructure) -1. [AI and Machine Learning for On-Device Development, 2021, By Laurence Moroney. O'Reilly](https://learning.oreilly.com/library/view/ai-and-machine/9781098101732/) -1. [Designing Machine Learning Systems ,2022 by Chip Huyen , O'Reilly ](https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/) -1. [Reliable Machine Learning. 2022. By Cathy Chen, Niall Richard Murphy, Kranti Parisa, D. Sculley, Todd Underwood. O'Reilly](https://learning.oreilly.com/library/view/reliable-machine-learning/9781098106218/) -1. [MLOps Lifecycle Toolkit. 2023. By Dayne Sorvisto. Apress](https://link.springer.com/book/10.1007/978-1-4842-9642-4) -1. [Implementing MLOps in the Enterprise. 2023. By Yaron Haviv, Noah Gift. O'Reilly](https://www.oreilly.com/library/view/implementing-mlops-in/9781098136574/) - -
- - -# MLOps Articles - -
-Click to expand! - -1. [Continuous Delivery for Machine Learning (by Thoughtworks)](https://martinfowler.com/articles/cd4ml.html) -1. [What is MLOps? NVIDIA Blog](https://blogs.nvidia.com/blog/2020/09/03/what-is-mlops/) -1. [MLSpec: A project to standardize the intercomponent schemas for a multi-stage ML Pipeline.](https://github.com/visenger/MLSpec) -1. [The 2021 State of Enterprise Machine Learning](https://info.algorithmia.com/tt-state-of-ml-2021) | State of Enterprise ML 2020: [PDF](https://info.algorithmia.com/hubfs/2019/Whitepapers/The-State-of-Enterprise-ML-2020/Algorithmia_2020_State_of_Enterprise_ML.pdf) and [Interactive](https://algorithmia.com/state-of-ml) -1. [Organizing machine learning projects: project management guidelines.](https://www.jeremyjordan.me/ml-projects-guide/) -1. [Rules for ML Project (Best practices)](http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf) -1. [ML Pipeline Template](https://www.agilestacks.com/tutorials/ml-pipelines) -1. [Data Science Project Structure](https://drivendata.github.io/cookiecutter-data-science/#directory-structure) -1. [Reproducible ML](https://github.com/cmawer/reproducible-model) -1. [ML project template facilitating both research and production phases.](https://github.com/visenger/ml-project-template) -1. [Machine learning requires a fundamentally different deployment approach. As organizations embrace machine learning, the need for new deployment tools and strategies grows.](https://www.oreilly.com/radar/machine-learning-requires-a-fundamentally-different-deployment-approach/) -1. [Introducting Flyte: A Cloud Native Machine Learning and Data Processing Platform](https://eng.lyft.com/introducing-flyte-cloud-native-machine-learning-and-data-processing-platform-fb2bb3046a59) -1. [Why is DevOps for Machine Learning so Different?](https://hackernoon.com/why-is-devops-for-machine-learning-so-different-384z32f1) -1. [Lessons learned turning machine learning models into real products and services – O’Reilly](https://www.oreilly.com/radar/lessons-learned-turning-machine-learning-models-into-real-products-and-services/) -1. [MLOps: Model management, deployment and monitoring with Azure Machine Learning](https://docs.microsoft.com/en-gb/azure/machine-learning/concept-model-management-and-deployment) -1. [Guide to File Formats for Machine Learning: Columnar, Training, Inferencing, and the Feature Store](https://towardsdatascience.com/guide-to-file-formats-for-machine-learning-columnar-training-inferencing-and-the-feature-store-2e0c3d18d4f9) -1. [Architecting a Machine Learning Pipeline How to build scalable Machine Learning systems](https://towardsdatascience.com/architecting-a-machine-learning-pipeline-a847f094d1c7) -1. [Why Machine Learning Models Degrade In Production](https://towardsdatascience.com/why-machine-learning-models-degrade-in-production-d0f2108e9214) -1. [Concept Drift and Model Decay in Machine Learning](http://xplordat.com/2019/04/25/concept-drift-and-model-decay-in-machine-learning/?source=post_page---------------------------) -1. [Machine Learning in Production: Why You Should Care About Data and Concept Drift](https://towardsdatascience.com/machine-learning-in-production-why-you-should-care-about-data-and-concept-drift-d96d0bc907fb) -1. [Bringing ML to Production](https://www.slideshare.net/mikiobraun/bringing-ml-to-production-what-is-missing-amld-2020) -1. [A Tour of End-to-End Machine Learning Platforms](https://databaseline.tech/a-tour-of-end-to-end-ml-platforms/) -1. [MLOps: Continuous delivery and automation pipelines in machine learning](https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning) -1. [AI meets operations](https://www.oreilly.com/radar/ai-meets-operations/) -1. [What would machine learning look like if you mixed in DevOps? Wonder no more, we lift the lid on MLOps](https://www.theregister.co.uk/2020/03/07/devops_machine_learning_mlops/) -1. [Forbes: The Emergence Of ML Ops](https://www.forbes.com/sites/cognitiveworld/2020/03/08/the-emergence-of-ml-ops/#72f04ed04698) -1. [Cognilytica Report "ML Model Management and Operations 2020 (MLOps)"](https://www.cognilytica.com/2020/03/03/ml-model-management-and-operations-2020-mlops/) -1. [Introducing Cloud AI Platform Pipelines](https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-ai-platform-pipelines) -1. [A Guide to Production Level Deep Learning ](https://github.com/alirezadir/Production-Level-Deep-Learning/blob/master/README.md) -1. [The 5 Components Towards Building Production-Ready Machine Learning Systems](https://medium.com/cracking-the-data-science-interview/the-5-components-towards-building-production-ready-machine-learning-system-a4d5237ec04e) -1. [Deep Learning in Production (references about deploying deep learning-based models in production)](https://github.com/ahkarami/Deep-Learning-in-Production) -1. [Machine Learning Experiment Tracking](https://towardsdatascience.com/machine-learning-experiment-tracking-93b796e501b0) -1. [The Team Data Science Process (TDSP)](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview) -1. [MLOps Solutions (Azure based)](https://github.com/visenger/MLOps) -1. [Monitoring ML pipelines](https://intothedepthsofdataengineering.wordpress.com/2020/02/13/monitoring-ml-pipelines/) -1. [Deployment & Explainability of Machine Learning COVID-19 Solutions at Scale with Seldon Core and Alibi](https://github.com/axsaucedo/seldon-core/tree/corona_research_exploration/examples/models/research_paper_classification) -1. [Demystifying AI Infrastructure](https://www.intel.com/content/www/us/en/intel-capital/news/story.html?id=a0F1I00000BNTXPUA5#/type=All/page=0/term=/tags=) -1. [Organizing machine learning projects: project management guidelines.](https://www.jeremyjordan.me/ml-projects-guide/) -1. [The Checklist for Machine Learning Projects (from Aurélien Géron,"Hands-On Machine Learning with Scikit-Learn and TensorFlow")](https://github.com/visenger/handson-ml/blob/master/ml-project-checklist.md) -1. [Data Project Checklist by Jeremy Howard](https://www.fast.ai/2020/01/07/data-questionnaire/) -1. [MLOps: not as Boring as it Sounds](https://itnext.io/mlops-not-as-boring-as-it-sounds-eaebe73e3533) -1. [10 Steps to Making Machine Learning Operational. Cloudera White Paper](https://www.cloudera.com/content/dam/www/marketing/resources/whitepapers/10-steps-to-making-ml-operational.pdf) -1. [MLOps is Not Enough. The Need for an End-to-End Data Science Lifecycle Process.](https://techcommunity.microsoft.com/t5/azure-ai/mlops-is-not-enough/ba-p/1386789) -1. [Data Science Lifecycle Repository Template](https://github.com/dslp/dslp-repo-template) -1. [Template: code and pipeline definition for a machine learning project demonstrating how to automate an end to end ML/AI workflow. ](https://github.com/aronchick/MLOps-pipeline) -1. [Nitpicking Machine Learning Technical Debt](https://matthewmcateer.me/blog/machine-learning-technical-debt/) -1. [The Best Tools, Libraries, Frameworks and Methodologies that Machine Learning Teams Actually Use – Things We Learned from 41 ML Startups](https://neptune.ai/blog/tools-libraries-frameworks-methodologies-ml-startups-roundup) -1. [Software Engineering for AI/ML - An Annotated Bibliography](https://github.com/ckaestne/seaibib) -1. [Intelligent System. Machine Learning in Practice](https://intelligentsystem.io/) -1. [CMU 17-445/645: Software Engineering for AI-Enabled Systems (SE4AI)](https://github.com/ckaestne/seai/) -1. [Machine Learning is Requirements Engineering](https://link.medium.com/l7akzjR826) -1. [Machine Learning Reproducibility Checklist](https://www.cs.mcgill.ca/~jpineau/ReproducibilityChecklist.pdf) -1. [Machine Learning Ops. A collection of resources on how to facilitate Machine Learning Ops with GitHub.](http://mlops-github.com/) -1. [Task Cheatsheet for Almost Every Machine Learning Project A checklist of tasks for building End-to-End ML projects](https://towardsdatascience.com/task-cheatsheet-for-almost-every-machine-learning-project-d0946861c6d0) -1. [Web services vs. streaming for real-time machine learning endpoints](https://towardsdatascience.com/web-services-vs-streaming-for-real-time-machine-learning-endpoints-c08054e2b18e) -1. [How PyTorch Lightning became the first ML framework to run continuous integration on TPUs](https://medium.com/pytorch/how-pytorch-lightning-became-the-first-ml-framework-to-runs-continuous-integration-on-tpus-a47a882b2c95) -1. [The ultimate guide to building maintainable Machine Learning pipelines using DVC](https://towardsdatascience.com/the-ultimate-guide-to-building-maintainable-machine-learning-pipelines-using-dvc-a976907b2a1b) -1. [Continuous Machine Learning (CML) is CI/CD for Machine Learning Projects (DVC)](https://cml.dev/) -1. [What I learned from looking at 200 machine learning tools](https://huyenchip.com/2020/06/22/mlops.html) | Update: [MLOps Tooling Landscape v2 (+84 new tools) - Dec '20](https://docs.google.com/spreadsheets/d/10pPQYmyNnYb6zshOKxBjJ704E0XUj2vJ9HCDfoZxAoA/edit#gid=1651929178) -1. [Big Data & AI Landscape](http://mattturck.com/wp-content/uploads/2018/07/Matt_Turck_FirstMark_Big_Data_Landscape_2018_Final.png) -1. [Deploying Machine Learning Models as Data, not Code — A better match?](https://towardsdatascience.com/deploying-machine-learning-models-as-data-not-code-omega-ml-8825a0ae530a) -1. [“Thou shalt always scale” — 10 commandments of MLOps](https://towardsdatascience.com/mlops-thou-shalt-always-scale-10-commandments-of-mlops-152c11e711a5) -1. [Three Risks in Building Machine Learning Systems](https://insights.sei.cmu.edu/sei_blog/2020/05/three-risks-in-building-machine-learning-systems.html) -1. [Blog about ML in production (by maiot.io)](https://blog.maiot.io/) -1. Back to the Machine Learning fundamentals: How to write code for Model deployment. [Part 1](https://medium.com/@ivannardini/back-to-the-machine-learning-fundamentals-how-to-write-code-for-model-deployment-part-1-3-4b05deda1cd1), [Part 2](https://medium.com/@ivannardini/back-to-the-machine-learning-fundamentals-how-to-write-code-for-model-deployment-part-2-3-9632d5a43f98), [Part 3](https://medium.com/@ivannardini/back-to-the-machine-learning-fundamentals-how-to-write-code-for-model-deployment-part-3-3-fb85102bebb2) -1. [MLOps: Machine Learning as an Engineering Discipline](https://towardsdatascience.com/ml-ops-machine-learning-as-an-engineering-discipline-b86ca4874a3f) -1. [ML Engineering on Google Cloud Platform (hands-on labs and code samples)](https://github.com/GoogleCloudPlatform/mlops-on-gcp) -1. [Deep Reinforcement Learning in Production. The use of Reinforcement Learning to Personalize User Experience at Zynga](https://towardsdatascience.com/deep-reinforcement-learning-in-production-7e1e63471e2) -1. [What is Data Observability?](https://towardsdatascience.com/what-is-data-observability-40b337971e3e) -1. [A Practical Guide to Maintaining Machine Learning in Production](https://eugeneyan.com/writing/practical-guide-to-maintaining-machine-learning/) -1. Continuous Machine Learning. [Part 1](https://mribeirodantas.xyz/blog/index.php/2020/08/10/continuous-machine-learning/), [Part 2](https://mribeirodantas.xyz/blog/index.php/2020/08/18/continuous-machine-learning-part-ii/). Part 3 is coming soon. -1. [The Agile approach in data science explained by an ML expert](https://www.iunera.com/kraken/big-data-science-strategy/the-agile-approach-in-data-science-explained-by-an-ml-expert/) -1. [Here is what you need to look for in a model server to build ML-powered services](https://anyscale.com/blog/heres-what-you-need-to-look-for-in-a-model-server-to-build-ml-powered-services/) -1. [The problem with AI developer tools for enterprises (and what IKEA has to do with it)](https://towardsdatascience.com/the-problem-with-ai-developer-tools-for-enterprises-and-what-ikea-has-to-do-with-it-b26277841661) -1. [Streaming Machine Learning with Tiered Storage](https://www.confluent.io/blog/streaming-machine-learning-with-tiered-storage/) -1. [Best practices for performance and cost optimization for machine learning (Google Cloud)](https://cloud.google.com/solutions/machine-learning/best-practices-for-ml-performance-cost) -1. [Lean Data and Machine Learning Operations](https://databaseline.tech/lean-dml-operations/) -1. [A Brief Guide to Running ML Systems in Production Best Practices for Site Reliability Engineers](https://www.oreilly.com/content/a-brief-guide-to-running-ml-systems-in-production/) -1. [AI engineering practices in the wild - SIG | Getting software right for a healthier digital world](https://www.softwareimprovementgroup.com/resources/ai-engineering-practices-in-the-wild/) -1. [SE-ML | The 2020 State of Engineering Practices for Machine Learning](https://se-ml.github.io/report2020) -1. [Awesome Software Engineering for Machine Learning (GitHub repository)](https://github.com/SE-ML/awesome-seml) -1. [Sampling isn’t enough, profile your ML data instead](https://towardsdatascience.com/sampling-isnt-enough-profile-your-ml-data-instead-6a28fcfb2bd4?source=friends_link&sk=5af46143562d348b182c449265ed54fb) -1. [Reproducibility in ML: why it matters and how to achieve it](https://determined.ai/blog/reproducibility-in-ml/) -1. [12 Factors of reproducible Machine Learning in production](https://blog.maiot.io/12-factors-of-ml-in-production/) -1. [MLOps: More Than Automation](https://devops.com/mlop-more-than-automation/) -1. [Lean Data Science](https://locallyoptimistic.com/post/lean-data-science/) -1. [Engineering Skills for Data Scientists](https://mark.douthwaite.io/tag/engineering-skills-for-data-scientists/) -1. [DAGsHub Blog. Read about data science and machine learning workflows, MLOps, and open source data science](https://dagshub.com/blog/) -1. [Data Science Project Flow for Startups](https://towardsdatascience.com/data-science-project-flow-for-startups-282a93d4508d) -1. [Data Science Engineering at Shopify](https://shopify.engineering/topics/data-science-engineering) -1. [Building state-of-the-art machine learning technology with efficient execution for the crypto economy](https://blog.coinbase.com/building-state-of-the-art-machine-learning-technology-with-efficient-execution-for-the-crypto-ad10896a48a) -1. [Completing the Machine Learning Loop](https://jimmymwhitaker.medium.com/completing-the-machine-learning-loop-e03c784eaab4) -1. [Deploying Machine Learning Models: A Checklist](https://twolodzko.github.io/ml-checklist) -1. [Global MLOps and ML tools landscape (by MLReef)](https://about.mlreef.com/blog/global-mlops-and-ml-tools-landscape) -1. [Why all Data Science teams need to get serious about MLOps](https://towardsdatascience.com/why-data-science-teams-needs-to-get-serious-about-mlops-56c98e255e20) -1. [MLOps Values (by Bart Grasza)](https://gist.github.com/bartgras/4ab9c716167b5d9aee6a222f7301ac60) -1. [Machine Learning Systems Design (by Chip Huyen)](https://huyenchip.com/machine-learning-systems-design/toc.html) -1. [Designing an ML system (Stanford | CS 329 | Chip Huyen)](https://docs.google.com/presentation/d/13a5B2HeK9Id59zy3oNJDv5_ksDvzbGmNLx4zumkimZM/edit?usp=sharing) -1. [How COVID-19 Has Infected AI Models (about the data drift or model drift concept)](https://www.dominodatalab.com/blog/how-covid-19-has-infected-ai-models/) -1. [Microkernel Architecture for Machine Learning Library. An Example of Microkernel Architecture with Python Metaclass](https://towardsdatascience.com/microkernel-architecture-for-machine-learning-library-c04b797e0d5f) -1. [Machine Learning in production: the Booking.com approach](https://booking.ai/https-booking-ai-machine-learning-production-3ee8fe943c70) -1. [What I Learned From Attending TWIMLcon 2021 (by James Le)](https://jameskle.com/writes/twiml2021) -1. [Designing ML Orchestration Systems for Startups. A case study in building a lightweight production-grade ML orchestration system](https://towardsdatascience.com/designing-ml-orchestration-systems-for-startups-202e527d7897) -1. [Towards MLOps: Technical capabilities of a Machine Learning platform | Prosus AI Tech Blog](https://medium.com/prosus-ai-tech-blog/towards-mlops-technical-capabilities-of-a-machine-learning-platform-61f504e3e281) -1. [Get started with MLOps A comprehensive MLOps tutorial with open source tools](https://towardsdatascience.com/get-started-with-mlops-fd7062cab018) -1. [From DevOps to MLOPS: Integrate Machine Learning Models using Jenkins and Docker](https://towardsdatascience.com/from-devops-to-mlops-integrate-machine-learning-models-using-jenkins-and-docker-79034dbedf1) -1. [Example code for a basic ML Platform based on Pulumi, FastAPI, DVC, MLFlow and more](https://github.com/aporia-ai/mlplatform-workshop) -1. [Software Engineering for Machine Learning: Characterizing and Detecting Mismatch in Machine-Learning Systems](https://insights.sei.cmu.edu/blog/software-engineering-for-machine-learning-characterizing-and-detecting-mismatch-in-machine-learning-systems/) -1. [TWIML Solutions Guide](https://twimlai.com/solutions/introducing-twiml-ml-ai-solutions-guide/) -1. [How Well Do You Leverage Machine Learning at Scale? Six Questions to Ask](https://medium.com/cognizantai/how-well-do-you-leverage-machine-learning-at-scale-six-questions-to-ask-7e6acda15ea5) -1. [Getting started with MLOps: Selecting the right capabilities for your use case](https://cloud.google.com/blog/products/ai-machine-learning/select-the-right-mlops-capabilities-for-your-ml-use-case) -1. [The Latest Work from the SEI: Artificial Intelligence, DevSecOps, and Security Incident Response](https://insights.sei.cmu.edu/blog/the-latest-work-from-the-sei-artificial-intelligence-devsecops-and-security-incident-response/) -1. [MLOps: The Ultimate Guide. A handbook on MLOps and how to think about it](https://towardsdatascience.com/mlops-the-ultimate-guide-9d902c752fd1) -1. [Enterprise Readiness of Cloud MLOps](https://gigaom.com/report/enterprise-readiness-of-cloud-mlops/) -1. [Should I Train a Model for Each Customer or Use One Model for All of My Customers?](https://towardsdatascience.com/should-i-train-a-model-for-each-customer-or-use-one-model-for-all-of-my-customers-f9e8734d991) -1. [MLOps-Basics (GitHub repo)](https://github.com/graviraja/MLOps-Basics) by [raviraja](https://github.com/graviraja) -1. [Another tool won’t fix your MLOps problems](https://dshersh.medium.com/too-many-mlops-tools-c590430ba81b) -1. [Best MLOps Tools: What to Look for and How to Evaluate Them (by NimbleBox.ai)](https://nimblebox.ai/blog/mlops-tools) -1. [MLOps vs. DevOps: A Detailed Comparison (by NimbleBox.ai)](https://nimblebox.ai/blog/mlops-vs-devops) -1. [A Guide To Setting Up Your MLOps Team (by NimbleBox.ai)](https://nimblebox.ai/blog/mlops-team-structure) -
- - - - -# MLOps: Workflow Management - -1. [Open-source Workflow Management Tools: A Survey by Ploomber](https://ploomber.io/posts/survey/) -1. [How to Compare ML Experiment Tracking Tools to Fit Your Data Science Workflow (by dagshub)](https://dagshub.com/blog/how-to-compare-ml-experiment-tracking-tools-to-fit-your-data-science-workflow/) -1. [15 Best Tools for Tracking Machine Learning Experiments](https://medium.com/neptune-ai/15-best-tools-for-tracking-machine-learning-experiments-64c6eff16808) - - -# MLOps: Feature Stores - -
-Click to expand! - -1. [Feature Stores for Machine Learning Medium Blog](https://medium.com/data-for-ai) -1. [MLOps with a Feature Store](https://www.logicalclocks.com/blog/mlops-with-a-feature-store) -1. [Feature Stores for ML](http://featurestore.org/) -1. [Hopsworks: Data-Intensive AI with a Feature Store](https://github.com/logicalclocks/hopsworks) -1. [Feast: An open-source Feature Store for Machine Learning](https://github.com/feast-dev/feast) -1. [What is a Feature Store?](https://www.tecton.ai/blog/what-is-a-feature-store/) -1. [ML Feature Stores: A Casual Tour](https://medium.com/@farmi/ml-feature-stores-a-casual-tour-fc45a25b446a) -1. [Comprehensive List of Feature Store Architectures for Data Scientists and Big Data Professionals](https://hackernoon.com/the-essential-architectures-for-every-data-scientist-and-big-data-engineer-f21u3e5c) -1. [ML Engineer Guide: Feature Store vs Data Warehouse (vendor blog)](https://www.logicalclocks.com/blog/feature-store-vs-data-warehouse) -1. [Building a Gigascale ML Feature Store with Redis, Binary Serialization, String Hashing, and Compression (DoorDash blog)](https://doordash.engineering/2020/11/19/building-a-gigascale-ml-feature-store-with-redis/) -1. [Feature Stores: Variety of benefits for Enterprise AI.](https://insidebigdata.com/2020/12/29/how-feature-stores-will-revolutionize-enterprise-ai/) -1. [Feature Store as a Foundation for Machine Learning](https://towardsdatascience.com/feature-store-as-a-foundation-for-machine-learning-d010fc6eb2f3) -1. [ML Feature Serving Infrastructure at Lyft](https://eng.lyft.com/ml-feature-serving-infrastructure-at-lyft-d30bf2d3c32a) -1. [Feature Stores for Self-Service Machine Learning](https://www.ethanrosenthal.com/2021/02/03/feature-stores-self-service/) -1. [The Architecture Used at LinkedIn to Improve Feature Management in Machine Learning Models.](https://jrodthoughts.medium.com/the-architecture-used-at-linkedin-to-improve-feature-management-in-machine-learning-models-c7bd6ae54db) -1. [Is There a Feature Store Over the Rainbow? How to select the right feature store for your use case](https://towardsdatascience.com/is-there-a-feature-store-over-the-rainbow-291cab94e8a5) -
- - -# MLOps: Data Engineering (DataOps) - -
-Click to expand! - -1. [The state of data quality in 2020 – O’Reilly](https://www.oreilly.com/radar/the-state-of-data-quality-in-2020/) -1. [Why We Need DevOps for ML Data](https://tecton.ai/blog/devops-ml-data/) -1. [Data Preparation for Machine Learning (7-Day Mini-Course)](https://machinelearningmastery.com/data-preparation-for-machine-learning-7-day-mini-course/) -1. [Best practices in data cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data.](https://www.researchgate.net/publication/266714997_Best_practices_in_data_cleaning_A_Complete_Guide_to_Everything_You_Need_to_Do_Before_and_After_Collecting_Your_Data) -1. [17 Strategies for Dealing with Data, Big Data, and Even Bigger Data](https://towardsdatascience.com/17-strategies-for-dealing-with-data-big-data-and-even-bigger-data-283426c7d260) -1. [DataOps Data Architecture](https://blog.datakitchen.io/blog/dataops-data-architecture) -1. [Data Orchestration — A Primer](https://medium.com/memory-leak/data-orchestration-a-primer-56f3ddbb1700) -1. [4 Data Trends to Watch in 2020](https://medium.com/memory-leak/4-data-trends-to-watch-in-2020-491707902c09) -1. [CSE 291D / 234: Data Systems for Machine Learning](http://cseweb.ucsd.edu/classes/fa20/cse291-d/index.html) -1. [A complete picture of the modern data engineering landscape](https://github.com/datastacktv/data-engineer-roadmap) -1. [Continuous Integration for your data with GitHub Actions and Great Expectations. One step closer to CI/CD for your data pipelines](https://greatexpectations.io/blog/github-actions/) -1. [Emerging Architectures for Modern Data Infrastructure](https://a16z.com/2020/10/15/the-emerging-architectures-for-modern-data-infrastructure/) -1. [Awesome Data Engineering. Learning path and resources to become a data engineer](https://awesomedataengineering.com/) -1. Data Quality at Airbnb [Part 1](https://medium.com/airbnb-engineering/data-quality-at-airbnb-e582465f3ef7) | [Part 2](https://medium.com/airbnb-engineering/data-quality-at-airbnb-870d03080469) -1. [DataHub: Popular metadata architectures explained](https://engineering.linkedin.com/blog/2020/datahub-popular-metadata-architectures-explained) -1. [Financial Times Data Platform: From zero to hero. An in-depth walkthrough of the evolution of our Data Platform](https://medium.com/ft-product-technology/financial-times-data-platform-from-zero-to-hero-143156bffb1d) -1. [Alki, or how we learned to stop worrying and love cold metadata (Dropbox)](https://dropbox.tech/infrastructure/alki--or-how-we-learned-to-stop-worrying-and-love-cold-metadata) -1. [A Beginner's Guide to Clean Data. Practical advice to spot and avoid data quality problems (by Benjamin Greve)](https://b-greve.gitbook.io/beginners-guide-to-clean-data/) -1. [ML Lake: Building Salesforce’s Data Platform for Machine Learning](https://engineering.salesforce.com/ml-lake-building-salesforces-data-platform-for-machine-learning-228c30e21f16) -1. [Data Catalog 3.0: Modern Metadata for the Modern Data Stack](https://towardsdatascience.com/data-catalog-3-0-modern-metadata-for-the-modern-data-stack-ec621f593dcf) -1. [Metadata Management Systems](https://gradientflow.com/the-growing-importance-of-metadata-management-systems/) -1. [Essential resources for data engineers (a curated recommended read and watch list for scalable data processing)](https://www.scling.com/reading-list/) -1. [Comprehensive and Comprehensible Data Catalogs: The What, Who, Where, When, Why, and How of Metadata Management (Paper)](https://arxiv.org/pdf/2103.07532.pdf) -1. [What I Learned From Attending DataOps Unleashed 2021 (byJames Le)](https://jameskle.com/writes/dataops-unleashed2021) -1. [Uber's Journey Toward Better Data Culture From First Principles](https://ubr.to/3lo9GU8) -1. [Cerberus - lightweight and extensible data validation library for Python](https://docs.python-cerberus.org/en/stable/) -1. [Design a data mesh architecture using AWS Lake Formation and AWS Glue. AWS Big Data Blog](https://aws.amazon.com/blogs/big-data/design-a-data-mesh-architecture-using-aws-lake-formation-and-aws-glue/) -1. [Data Management Challenges in Production Machine Learning (slides)](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46178.pdf) -1. [The Missing Piece of Data Discovery and Observability Platforms: Open Standard for Metadata](https://towardsdatascience.com/the-missing-piece-of-data-discovery-and-observability-platforms-open-standard-for-metadata-37dac2d0503) -1. [Automating Data Protection at Scale](https://medium.com/airbnb-engineering/automating-data-protection-at-scale-part-1-c74909328e08) -1. [A curated list of awesome pipeline toolkits](https://github.com/pditommaso/awesome-pipeline) -1. [Data Mesh Archtitecture](https://www.datamesh-architecture.com/) -1. [The Essential Guide to Data Exploration in Machine Learning (by NimbleBox.ai)](https://nimblebox.ai/blog/data-exploration) -1. [Finding millions of label errors with Cleanlab](https://datacentricai.org/blog/finding-millions-of-label-errors-with-cleanlab/) -
- - - - -# MLOps: Model Deployment and Serving -
-Click to expand! - -1. [AI Infrastructure for Everyone: DeterminedAI](https://determined.ai/) -1. [Deploying R Models with MLflow and Docker](https://mdneuzerling.com/post/deploying-r-models-with-mlflow-and-docker/) -1. [What Does it Mean to Deploy a Machine Learning Model?](https://mlinproduction.com/what-does-it-mean-to-deploy-a-machine-learning-model-deployment-series-01/) -1. [Software Interfaces for Machine Learning Deployment](https://mlinproduction.com/software-interfaces-for-machine-learning-deployment-deployment-series-02/) -1. [Batch Inference for Machine Learning Deployment](https://mlinproduction.com/batch-inference-for-machine-learning-deployment-deployment-series-03/) -1. [AWS Cost Optimization for ML Infrastructure - EC2 spend](https://blog.floydhub.com/aws-cost-optimization-for-ml-infra-ec2/) -1. [CI/CD for Machine Learning & AI](https://blog.paperspace.com/ci-cd-for-machine-learning-ai/) -1. [Itaú Unibanco: How we built a CI/CD Pipeline for machine learning with ***online training*** in Kubeflow](https://cloud.google.com/blog/products/ai-machine-learning/itau-unibanco-how-we-built-a-cicd-pipeline-for-machine-learning-with-online-training-in-kubeflow) -1. [101 For Serving ML Models](https://pakodas.substack.com/p/101-for-serving-ml-models-10217c9f0764) -1. [Deploying Machine Learning models to production — **Inference service architecture patterns**](https://medium.com/data-for-ai/deploying-machine-learning-models-to-production-inference-service-architecture-patterns-bc8051f70080) -1. [Serverless ML: Deploying Lightweight Models at Scale](https://mark.douthwaite.io/serverless-machine-learning/) -1. ML Model Rollout To Production. [Part 1](https://www.superwise.ai/resources-old/safely-rolling-out-ml-models-to-production) | [Part 2](https://www.superwise.ai/blog/part-ii-safely-rolling-out-models-to-production) -1. [Deploying Python ML Models with Flask, Docker and Kubernetes](https://alexioannides.com/2019/01/10/deploying-python-ml-models-with-flask-docker-and-kubernetes/) -1. [Deploying Python ML Models with Bodywork](https://alexioannides.com/2020/12/01/deploying-ml-models-with-bodywork/) -1. [Framework for a successful Continuous Training Strategy. When should the model be retrained? What data should be used? What should be retrained? A data-driven approach](https://towardsdatascience.com/framework-for-a-successful-continuous-training-strategy-8c83d17bb9dc) -1. [Efficient Machine Learning Inference. The benefits of multi-model serving where latency matters](https://www.oreilly.com/content/efficient-machine-learning-inference/) -1. [Deploying Hugging Face ML Models in the Cloud with Infrastructure as Code](https://www.pulumi.com/blog/mlops-the-ai-challenge-is-cloud-not-code/) -
- - - -# MLOps: Testing, Monitoring and Maintenance -
-Click to expand! - -1. [Building dashboards for operational visibility (AWS)](https://aws.amazon.com/builders-library/building-dashboards-for-operational-visibility/) -1. [Monitoring Machine Learning Models in Production](https://christophergs.com/machine%20learning/2020/03/14/how-to-monitor-machine-learning-models/) -1. [Effective testing for machine learning systems](https://www.jeremyjordan.me/testing-ml/) -1. [Unit Testing Data: What is it and how do you do it?](https://winderresearch.com/unit-testing-data-what-is-it-and-how-do-you-do-it/) -1. [How to Test Machine Learning Code and Systems](https://eugeneyan.com/writing/testing-ml/) ([Accompanying code](https://github.com/eugeneyan/testing-ml)) -1. [Wu, T., Dong, Y., Dong, Z., Singa, A., Chen, X. and Zhang, Y., 2020. Testing Artificial Intelligence System Towards Safety and Robustness: State of the Art. IAENG International Journal of Computer Science, 47(3).](http://www.iaeng.org/IJCS/issues_v47/issue_3/IJCS_47_3_13.pdf) -1. [Multi-Armed Bandits and the Stitch Fix Experimentation Platform](https://multithreaded.stitchfix.com/blog/2020/08/05/bandits/) -1. [A/B Testing Machine Learning Models](https://mlinproduction.com/ab-test-ml-models-deployment-series-08/) -1. [Data validation for machine learning. Polyzotis, N., Zinkevich, M., Roy, S., Breck, E. and Whang, S., 2019. Proceedings of Machine Learning and Systems](https://mlsys.org/Conferences/2019/doc/2019/167.pdf) -1. [Testing machine learning based systems: a systematic mapping](https://link.springer.com/content/pdf/10.1007/s10664-020-09881-0.pdf) -1. [Explainable Monitoring: Stop flying blind and monitor your AI](https://blog.fiddler.ai/2020/04/explainable-monitoring-stop-flying-blind-and-monitor-your-ai/) -1. [WhyLogs: Embrace Data Logging Across Your ML Systems](https://medium.com/whylabs/whylogs-embrace-data-logging-a9449cd121d) -1. [Evidently AI. Insights on doing machine learning in production. (Vendor blog.)](https://evidentlyai.com/blog) -1. [The definitive guide to comprehensively monitoring your AI](https://www.monalabs.io/mona-blog/definitiveguidetomonitorai) -1. [Introduction to Unit Testing for Machine Learning](https://themlrebellion.com/blog/Introduction-To-Unit-Testing-Machine-Learning/) -1. [Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance](https://towardsdatascience.com/production-machine-learning-monitoring-outliers-drift-explainers-statistical-performance-d9b1d02ac158) -1. Test-Driven Development in MLOps [Part 1](https://medium.com/mlops-community/test-driven-development-in-mlops-part-1-8894575f4dec) -1. [Domain-Specific Machine Learning Monitoring](https://medium.com/mlops-community/domain-specific-machine-learning-monitoring-88bc0dd8a212) -1. [Introducing ML Model Performance Management (Blog by fiddler)](https://blog.fiddler.ai/2021/03/introducing-ml-model-performance-management/) -1. [What is ML Observability? (Arize AI)](https://arize.com/what-is-ml-observability/) -1. [Beyond Monitoring: The Rise of Observability (Arize AI & Monte Carlo Data)](https://arize.com/beyond-monitoring-the-rise-of-observability/) -1. [Model Failure Modes (Arize AI)](https://arize.com/ml-model-failure-modes/) -1. [Quick Start to Data Quality Monitoring for ML (Arize AI)](https://arize.com/data-quality-monitoring/) -1. [Playbook to Monitoring Model Performance in Production (Arize AI)](https://arize.com/monitor-your-model-in-production/) -1. [Robust ML by Property Based Domain Coverage Testing (Blog by Efemarai)](https://towardsdatascience.com/why-dont-we-test-machine-learning-as-we-test-software-43f5720903d) -1. [Monitoring and explainability of models in production](https://arxiv.org/pdf/2007.06299.pdf) -1. [Beyond Monitoring: The Rise of Observability](https://aparnadhinak.medium.com/beyond-monitoring-the-rise-of-observability-c53bdc1d2e0b) -1. [ML Model Monitoring – 9 Tips From the Trenches. (by NU bank)](https://building.nubank.com.br/ml-model-monitoring-9-tips-from-the-trenches/) -1. [Model health assurance at LinkedIn. By LinkedIn Engineering](https://engineering.linkedin.com/blog/2021/model-health-assurance-at-linkedin) -1. [How to Trust Your Deep Learning Code](https://krokotsch.eu/cleancode/2020/08/11/Unit-Tests-for-Deep-Learning.html) ([Accompanying code](https://github.com/tilman151/unittest_dl)) -1. [Estimating Performance of Regression Models Without Ground-Truth](https://bit.ly/medium-estimating-performance-regression) (Using [NannyML](https://bit.ly/ml-ops-nannyml)) -1. [How Hyperparameter Tuning in Machine Learning Works (by NimbleBox.ai)](https://nimblebox.ai/blog/hyperparameter-tuning-machine-learning) -
- - -# MLOps: Infrastructure & Tooling -
-Click to expand! - -1. [MLOps Infrastructure Stack Canvas](https://miro.com/app/board/o9J_lfoc4Hg=/) -1. [Rise of the Canonical Stack in Machine Learning. How a Dominant New Software Stack Will Unlock the Next Generation of Cutting Edge AI Apps](https://towardsdatascience.com/rise-of-the-canonical-stack-in-machine-learning-724e7d2faa75) -1. [AI Infrastructure Alliance. Building the canonical stack for AI/ML](https://ai-infrastructure.org/) -1. [Linux Foundation AI Foundation](https://wiki.lfai.foundation/) -1. ML Infrastructure Tools for Production | [Part 1 — Production ML — The Final Stage of the Model Workflow](https://towardsdatascience.com/ml-infrastructure-tools-for-production-1b1871eecafb) | [Part 2 — Model Deployment and Serving](https://towardsdatascience.com/ml-infrastructure-tools-for-production-part-2-model-deployment-and-serving-fcfc75c4a362) -1. [The MLOps Stack Template (by valohai)](https://valohai.com/blog/the-mlops-stack/) -1. [Navigating the MLOps tooling landscape](https://ljvmiranda921.github.io/notebook/2021/05/10/navigating-the-mlops-landscape/) -1. [MLOps.toys curated list of MLOps projects (by Aporia)](https://mlops.toys/) -1. [Comparing Cloud MLOps platforms, From a former AWS SageMaker PM](https://towardsdatascience.com/comparing-cloud-mlops-platform-from-a-former-aws-sagemaker-pm-115ced28239b) -1. [Machine Learning Ecosystem 101 (whitepaper by Arize AI)](https://arize.com/wp-content/uploads/2021/04/Arize-AI-Ecosystem-White-Paper.pdf) -1. [Selecting your optimal MLOps stack: advantages and challenges. By Intellerts](https://intellerts.com/selecting-your-optimal-mlops-stack-advantages-and-challenges/) -1. [Infrastructure Design for Real-time Machine Learning Inference. The Databricks Blog](https://databricks.com/blog/2021/09/01/infrastructure-design-for-real-time-machine-learning-inference.html) -1. [The 2021 State of AI Infrastructure Survey](https://pages.run.ai/hubfs/PDFs/2021-State-of-AI-Infrastructure-Survey.pdf) -1. [AI infrastructure Maturity matrix](https://pages.run.ai/hubfs/PDFs/AI-Infrastructure-Maturity-Benchmarking-Model.pdf) -1. [A Curated Collection of the Best Open-source MLOps Tools. By Censius](https://censius.ai/mlops-tools) -1. [Best MLOps Tools to Manage the ML Lifecycle (by NimbleBox.ai)](https://nimblebox.ai/blog/mlops-tools) -1. [The minimum set of must-haves for MLOps](https://marvelousmlops.substack.com/p/the-minimum-set-of-must-haves-for) -
- - - -# MLOps Papers - -A list of scientific and industrial papers and resources about Machine Learning operalization since 2015. [See more.](papers.md) - - - -# Talks About MLOps -
-Click to expand! - -1. ["MLOps: Automated Machine Learning" by Emmanuel Raj](https://www.youtube.com/watch?v=m32k9jcY4pY) -1. [DeliveryConf 2020. "Continuous Delivery For Machine Learning: Patterns And Pains" by Emily Gorcenski](https://youtu.be/bFW5mZmj0nQ) -1. [MLOps Conference: Talks from 2019](https://www.mlopsconf.com?wix-vod-comp-id=comp-k1ry4afh) -1. [Kubecon 2019: Flyte: Cloud Native Machine Learning and Data Processing Platform](https://www.youtube.com/watch?v=KdUJGSP1h9U) -1. [Kubecon 2019: Running LargeScale Stateful workloads on Kubernetes at Lyft](https://www.youtube.com/watch?v=ECeVQoble0g) -1. [A CI/CD Framework for Production Machine Learning at Massive Scale (using Jenkins X and Seldon Core)](https://youtu.be/68_Phxwaj-k) -1. [MLOps Virtual Event (Databricks)](https://youtu.be/9Ehh7Vl7ByM) -1. [MLOps NY conference 2019](https://www.iguazio.com/mlops-nyc-sessions/) -1. [MLOps.community YouTube Channel](https://www.youtube.com/channel/UCG6qpjVnBTTT8wLGBygANOQ) -1. [MLinProduction YouTube Channel](https://www.youtube.com/channel/UC3B_Z9FTeu4i8xtxDjGaZxw) -1. [Introducing MLflow for End-to-End Machine Learning on Databricks. Spark+AI Summit 2020. Sean Owen](https://youtu.be/nx3yFzx_nHI) -1. [MLOps Tutorial #1: Intro to Continuous Integration for ML](https://youtu.be/9BgIDqAzfuA) -1. [Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams (2019)](https://youtu.be/46l_C7ibpuo) -1. [Damian Brady - The emerging field of MLops](https://humansofai.podbean.com/e/damian-brady-the-emerging-field-of-mlops/) -1. [MLOps - Entwurf, Entwicklung, Betrieb (INNOQ Podcast in German)](https://www.innoq.com/en/podcast/076-mlops/) -1. [Instrumentation, Observability & Monitoring of Machine Learning Models](https://www.infoq.com/presentations/instrumentation-observability-monitoring-ml/) -1. [Efficient ML engineering: Tools and best practices](https://learning.oreilly.com/videos/oreilly-strata-data/9781492050681/9781492050681-video327465?autoplay=false) -1. [Beyond the jupyter notebook: how to build data science products](https://towardsdatascience.com/beyond-the-jupyter-notebook-how-to-build-data-science-products-50d942fc25d8) -1. [An introduction to MLOps on Google Cloud](https://www.youtube.com/watch?v=6gdrwFMaEZ0#action=share) (First 19 min are vendor-, language-, and framework-agnostic. @visenger) -1. [How ML Breaks: A Decade of Outages for One Large ML Pipeline](https://youtu.be/hBMHohkRgAA) -1. [Clean Machine Learning Code: Practical Software Engineering](https://youtu.be/PEjTAJHxYPM) -1. [Machine Learning Engineering: 10 Fundamentale Praktiken](https://www.youtube.com/watch?v=VYlXNWxqJ2A) -1. [Architecture of machine learning systems (3-part series)](https://www.youtube.com/playlist?list=PLx8omXiw3n9y26FKZLV5ScyS52D_c29QN) -1. [Machine Learning Design Patterns](https://youtu.be/udXjlvCFusc) -1. [The laylist that covers techniques and approaches for model deployment on to production](https://youtube.com/playlist?list=PL3N9eeOlCrP5PlN1jwOB3jVZE6nYTVswk) -1. [ML Observability: A Critical Piece in Ensuring Responsible AI (Arize AI at Re-Work)](https://www.youtube.com/watch?v=2FE1sg749V[o) -1. [ML Engineering vs. Data Science (Arize AI Un/Summit)](https://www.youtube.com/watch?v=lP_4lT2k7Kg&t=2s) -1. [SRE for ML: The First 10 Years and the Next 10 ](https://www.usenix.org/conference/srecon21/presentation/underwood-sre-ml) -1. [Demystifying Machine Learning in Production: Reasoning about a Large-Scale ML Platform](https://www.usenix.org/conference/srecon21/presentation/mcglohon) -1. [Apply Conf 2022](https://www.applyconf.com/apply-conf-may-2022/) -1. [Databricks' Data + AI Summit 2022](https://databricks.com/dataaisummit/north-america-2022) -1. [RE•WORK MLOps Summit 2022](https://www.re-work.co/events/mlops-summit-2022) -1. [Annual MLOps World Conference](https://mlopsworld.com/) -
- - -# Existing ML Systems -
-Click to expand! - -1. [Introducing FBLearner Flow: Facebook’s AI backbone](https://engineering.fb.com/ml-applications/introducing-fblearner-flow-facebook-s-ai-backbone/) -1. [TFX: A TensorFlow-Based Production-Scale Machine Learning Platform](https://dl.acm.org/doi/pdf/10.1145/3097983.3098021?download=true) -1. [Accelerate your ML and Data workflows to production: Flyte](https://flyte.org/) -1. [Getting started with Kubeflow Pipelines](https://cloud.google.com/blog/products/ai-machine-learning/getting-started-kubeflow-pipelines) -1. [Meet Michelangelo: Uber’s Machine Learning Platform](https://www.uber.com/blog/michelangelo-machine-learning-platform/) -1. [Meson: Workflow Orchestration for Netflix Recommendations](https://netflixtechblog.com/meson-workflow-orchestration-for-netflix-recommendations-fc932625c1d9) -1. [What are Azure Machine Learning pipelines?](https://docs.microsoft.com/en-gb/azure/machine-learning/concept-ml-pipelines) -1. [Uber ATG’s Machine Learning Infrastructure for Self-Driving Vehicles](https://eng.uber.com/machine-learning-model-life-cycle-version-control/) -1. [An overview of ML development platforms](https://www.linkedin.com/pulse/overview-ml-development-platforms-louis-dorard/) -1. [Snorkel AI: Putting Data First in ML Development](https://www.snorkel.ai/07-14-2020-snorkel-ai-launch.html) -1. [A Tour of End-to-End Machine Learning Platforms](https://databaseline.tech/a-tour-of-end-to-end-ml-platforms/) -1. [Introducing WhyLabs, a Leap Forward in AI Reliability](https://medium.com/whylabs/introducing-whylabs-5a3b4f37b998) -1. [Project: Ease.ml (ETH Zürich)](https://ds3lab.inf.ethz.ch/easeml.html) -1. [Bodywork: model-training and deployment automation](https://bodywork.readthedocs.io/en/latest/) -1. [Lessons on ML Platforms — from Netflix, DoorDash, Spotify, and more](https://towardsdatascience.com/lessons-on-ml-platforms-from-netflix-doordash-spotify-and-more-f455400115c7) -1. [Papers & tech blogs by companies sharing their work on data science & machine learning in production. By Eugen Yan](https://github.com/eugeneyan/applied-ml) -1. [How do different tech companies approach building internal ML platforms? (tweet)](https://twitter.com/EvidentlyAI/status/1420328878585913344) -1. [Declarative Machine Learning Systems](https://dl.acm.org/doi/pdf/10.1145/3475965.3479315) -1. [StreamING Machine Learning Models: How ING Adds Fraud Detection Models at Runtime with Apache Flink](https://www.ververica.com/blog/real-time-fraud-detection-ing-bank-apache-flink) -
- - -# Machine Learning -
-Click to expand! - -1. Book, Aurélien Géron,"Hands-On Machine Learning with Scikit-Learn and TensorFlow" -1. [Foundations of Machine Learning](https://bloomberg.github.io/foml/) -1. [Best Resources to Learn Machine Learning](http://www.trainindatablog.com/best-resources-to-learn-machine-learning/) -1. [Awesome TensorFlow](https://github.com/jtoy/awesome-tensorflow) -1. ["Papers with Code" - Browse the State-of-the-Art in Machine Learning](https://paperswithcode.com/sota) -1. [Zhi-Hua Zhou. 2012. Ensemble Methods: Foundations and Algorithms. Chapman & Hall/CRC.](https://www.amazon.com/exec/obidos/ASIN/1439830037/acmorg-20) -1. [Feature Engineering for Machine Learning. Principles and Techniques for Data Scientists. By Alice Zheng, Amanda Casari](https://www.amazon.com/Feature-Engineering-Machine-Learning-Principles-ebook/dp/B07BNX4MWC) -1. [Google Research: Looking Back at 2019, and Forward to 2020 and Beyond](https://ai.googleblog.com/2020/01/google-research-looking-back-at-2019.html) -1. [O’Reilly: The road to Software 2.0](https://www.oreilly.com/radar/the-road-to-software-2-0/) -1. [Machine Learning and Data Science Applications in Industry](https://github.com/firmai/industry-machine-learning) -1. [Deep Learning for Anomaly Detection](https://ff12.fastforwardlabs.com/) -1. [Federated Learning for Mobile Keyboard Prediction](https://arxiv.org/pdf/1811.03604.pdf) -1. [Federated Learning. Building better products with on-device data and privacy on default](https://federated.withgoogle.com/) -1. [Federated Learning: Collaborative Machine Learning without Centralized Training Data](https://ai.googleblog.com/2017/04/federated-learning-collaborative.html) -1. [Yang, Q., Liu, Y., Cheng, Y., Kang, Y., Chen, T. and Yu, H., 2019. Federated learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 13(3). Chapters 1 and 2.](https://www.morganclaypoolpublishers.com/catalog_Orig/samples/9781681736983_sample.pdf) -1. [Federated Learning by FastForward](https://federated.fastforwardlabs.com/) -1. [THE FEDERATED & DISTRIBUTED MACHINE LEARNING CONFERENCE](https://www.federatedlearningconference.com/) -1. [Federated Learning: Challenges, Methods, and Future Directions](https://blog.ml.cmu.edu/2019/11/12/federated-learning-challenges-methods-and-future-directions/) -1. [Book: Molnar, Christoph. "Interpretable machine learning. A Guide for Making Black Box Models Explainable", 2019](https://christophm.github.io/interpretable-ml-book/) -1. [Book: Hutter, Frank, Lars Kotthoff, and Joaquin Vanschoren. "Automated Machine Learning". Springer,2019.](https://originalstatic.aminer.cn/misc/pdf/Hutter-AutoML_Book_compressed.pdf) -1. [ML resources by topic, curated by the community. ](https://madewithml.com/topics/) -1. [An Introduction to Machine Learning Interpretability, by Patrick Hall, Navdeep Gill, 2nd Edition. O'Reilly 2019](https://learning.oreilly.com/library/view/an-introduction-to/9781098115487/) -1. [Examples of techniques for training interpretable machine learning (ML) models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.](https://github.com/jphall663/interpretable_machine_learning_with_python) -1. [Paper: "Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence", by Sebastian Raschka, Joshua Patterson, and Corey Nolet. 2020](https://arxiv.org/pdf/2002.04803.pdf) -1. [Distill: Machine Learning Research](https://distill.pub/) -1. [AtHomeWithAI: Curated Resource List by DeepMind](https://storage.googleapis.com/deepmind-media/research/New_AtHomeWithAI%20resources.pdf) -1. [Awesome Data Science](https://github.com/academic/awesome-datascience) -1. [Intro to probabilistic programming. A use case using Tensorflow-Probability (TFP)](https://towardsdatascience.com/intro-to-probabilistic-programming-b47c4e926ec5) -1. [Dive into Snorkel: Weak-Superversion on German Texts. inovex Blog](https://www.inovex.de/blog/snorkel-weak-superversion-german-texts/) -1. [Dive into Deep Learning. An interactive deep learning book with code, math, and discussions. Provides NumPy/MXNet, PyTorch, and TensorFlow implementations](http://d2l.ai/) -1. [Data Science Collected Resources (GitHub repository)](https://github.com/tirthajyoti/Data-science-best-resources) -1. [Set of illustrated Machine Learning cheatsheets](https://stanford.edu/~shervine/teaching/cs-229/) -1. ["Machine Learning Bookcamp" by Alexey Grigorev](https://www.manning.com/books/machine-learning-bookcamp) -1. [130 Machine Learning Projects Solved and Explained](https://medium.com/the-innovation/130-machine-learning-projects-solved-and-explained-605d188fb392) -1. [Machine learning cheat sheet](https://github.com/soulmachine/machine-learning-cheat-sheet) -1. [Stateoftheart AI. An open-data and free platform built by the research community to facilitate the collaborative development of AI](https://www.stateoftheart.ai/) -1. [Online Machine Learning Courses: 2020 Edition](https://www.blog.confetti.ai/post/best-online-machine-learning-courses-2020-edition) -1. [End-to-End Machine Learning Library](https://e2eml.school/blog.html) -1. [Machine Learning Toolbox (by Amit Chaudhary)](https://amitness.com/toolbox/) -1. [Causality for Machine Learning](https://ff13.fastforwardlabs.com/FF13-Causality_for_Machine_Learning-Cloudera_Fast_Forward.pdf) -1. [Causal Inference for the Brave and True](https://matheusfacure.github.io/python-causality-handbook/landing-page.html) -1. [Causal Inference](https://mixtape.scunning.com/index.html) -1. [A resource list for causality in statistics, data science and physics](https://github.com/msuzen/looper/blob/master/looper.md) -1. [Learning from data. Caltech](http://work.caltech.edu/lectures.html) -1. [Machine Learning Glossary](https://ml-cheatsheet.readthedocs.io/en/latest/#) -1. [Book: "Distributed Machine Learning Patterns". 2022. By Yuan Tang. Manning](https://www.manning.com/books/distributed-machine-learning-patterns) -1. [Machine Learning for Beginners - A Curriculum](https://github.com/microsoft/ML-For-Beginners) -1. [Making Friends with Machine Learning. By Cassie Kozyrkov]() -1. [Machine Learning Workflow - A Complete Guide (by NimbleBox.ai)](https://nimblebox.ai/blog/machine-learning-workflow) -1. [Performance Metrics to Monitor in Machine Learning Projects (by NimbleBox.ai)](https://nimblebox.ai/blog/machine-learning-performance-metrics) - -
- - - - - - -# Software Engineering -
-Click to expand! - -1. [The Twelve Factors](https://12factor.net/) -1. [Book "Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations", 2018 by Nicole Forsgren et.al](https://www.amazon.com/Accelerate-Software-Performing-Technology-Organizations/dp/1942788339) -1. [Book "The DevOps Handbook" by Gene Kim, et al. 2016](https://itrevolution.com/book/the-devops-handbook/) -1. [State of DevOps 2019](https://research.google/pubs/pub48455/) -1. [Clean Code concepts adapted for machine learning and data science.](https://github.com/davified/clean-code-ml) -1. [School of SRE](https://linkedin.github.io/school-of-sre/) -1. [10 Laws of Software Engineering That People Ignore](https://www.indiehackers.com/post/10-laws-of-software-engineering-that-people-ignore-e3439176dd) -1. [The Patterns of Scalable, Reliable, and Performant Large-Scale Systems](http://awesome-scalability.com/) -1. [The Book of Secret Knowledge](https://github.com/trimstray/the-book-of-secret-knowledge) -1. [SHADES OF CONWAY'S LAW](https://thinkinglabs.io/articles/2021/05/07/shades-of-conways-law.html) -1. [Engineering Practices for Data Scientists](https://valohai.com/engineering-practices-ebook/) -
- - - -# Product Management for ML/AI -
-Click to expand! - -1. [What you need to know about product management for AI. A product manager for AI does everything a traditional PM does, and much more.](https://www.oreilly.com/radar/what-you-need-to-know-about-product-management-for-ai/) -1. [Bringing an AI Product to Market. Previous articles have gone through the basics of AI product management. Here we get to the meat: how do you bring a product to market?](https://www.oreilly.com/radar/bringing-an-ai-product-to-market/) -1. [The People + AI Guidebook](https://pair.withgoogle.com/guidebook/) -1. [User Needs + Defining Success](https://pair.withgoogle.com/chapter/user-needs/) -1. [Building machine learning products: a problem well-defined is a problem half-solved.](https://www.jeremyjordan.me/ml-requirements/) -1. [Talk: Designing Great ML Experiences (Apple)](https://developer.apple.com/videos/play/wwdc2019/803/) -1. [Machine Learning for Product Managers](http://nlathia.github.io/2017/03/Machine-Learning-for-Product-Managers.html) -1. [Understanding the Data Landscape and Strategic Play Through Wardley Mapping](https://ergestx.com/data-landscape-wardley-mapping/) -1. [Techniques for prototyping machine learning systems across products and features](https://design.google/library/simulating-intelligence/) -1. [Machine Learning and User Experience: A Few Resources](https://medium.com/ml-ux/machine-learning-and-user-experience-a-few-resources-e7872f1d34ee) -1. [AI ideation canvas](https://idalab.de/wp-content/uploads/2021/02/idalab-AI-ideation-canvas-Feb21.pdf) -1. [Ideation in AI](https://idalab.de/ideation-in-ai-five-ways-to-make-the-workshops-work/) -1. [5 Steps for Building Machine Learning Models for Business. By shopify engineering](https://shopify.engineering/building-business-machine-learning-models) -1. [Metric Design for Data Scientists and Business Leaders](https://towardsdatascience.com/metric-design-for-data-scientists-and-business-leaders-b8adaf46c00) -
- - - -# The Economics of ML/AI -
-Click to expand! - -1. [Book: "Prediction Machines: The Simple Economics of Artificial Intelligence"](https://www.predictionmachines.ai/) -1. [Book: "The AI Organization" by David Carmona](https://learning.oreilly.com/library/view/the-ai-organization/9781492057369/) -1. [Book: "Succeeding with AI". 2020. By Veljko Krunic. Manning Publications](https://learning.oreilly.com/library/view/succeeding-with-ai/9781617296932/) -1. [A list of articles about AI and the economy](https://www.predictionmachines.ai/articles) -1. [Gartner AI Trends 2019](https://blogs.gartner.com/smarterwithgartner/files/2019/08/CTMKT_736691_Hype_Cycle_for_AI_2019.png) -1. [Global AI Survey: AI proves its worth, but few scale impact](https://www.mckinsey.com/featured-insights/artificial-intelligence/global-ai-survey-ai-proves-its-worth-but-few-scale-impact) -1. [Getting started with AI? Start here! Everything you need to know to dive into your project](https://medium.com/hackernoon/the-decision-makers-guide-to-starting-ai-72ee0d7044df) -1. [11 questions to ask before starting a successful Machine Learning project](https://tryolabs.com/blog/2019/02/13/11-questions-to-ask-before-starting-a-successful-machine-learning-project/) -1. [What AI still can’t do](https://www.technologyreview.com/s/615189/what-ai-still-cant-do/) -1. [Demystifying AI Part 4: What is an AI Canvas and how do you use it?](https://www.wearebrain.com/blog/ai-data-science/what-is-an-ai-canvas/) -1. [A Data Science Workflow Canvas to Kickstart Your Projects](https://towardsdatascience.com/a-data-science-workflow-canvas-to-kickstart-your-projects-db62556be4d0) -1. [Is your AI project a nonstarter? Here’s a reality check(list) to help you avoid the pain of learning the hard way](https://medium.com/hackernoon/ai-reality-checklist-be34e2fdab9) -1. [What is THE main reason most ML projects fail?](https://towardsdatascience.com/what-is-the-main-reason-most-ml-projects-fail-515d409a161f) -1. [Designing great data products. The Drivetrain Approach: A four-step process for building data products.](https://www.oreilly.com/radar/drivetrain-approach-data-products/) -1. [The New Business of AI (and How It’s Different From Traditional Software)](https://a16z.com/2020/02/16/the-new-business-of-ai-and-how-its-different-from-traditional-software/) -1. [The idea maze for AI startups](https://cdixon.org/2015/02/01/the-ai-startup-idea-maze) -1. [The Enterprise AI Challenge: Common Misconceptions](https://www.forbes.com/sites/forbestechcouncil/2020/01/15/the-enterprise-ai-challenge-common-misconceptions/#37ca1e5c5696) -1. [Misconception 1 (of 5): Enterprise AI Is Primarily About The Technology](https://www.forbes.com/sites/forbestechcouncil/2020/01/31/misconception-1-of-5-enterprise-ai-is-primarily-about-the-technology/#151e6711180e) -1. [Misconception 2 (of 5): Automated Machine Learning Will Unlock Enterprise AI](https://www.forbes.com/sites/forbestechcouncil/2020/02/27/misconception-2-of-5-automated-machine-learning-will-unlock-enterprise-ai/#7f618ff97ace) -1. [Three Principles for Designing ML-Powered Products](https://spotify.design/articles/2019-12-10/three-principles-for-designing-ml-powered-products/) -1. [A Step-by-Step Guide to Machine Learning Problem Framing](https://medium.com/thelaunchpad/a-step-by-step-guide-to-machine-learning-problem-framing-6fc17126b981) -1. [AI adoption in the enterprise 2020](https://www.oreilly.com/radar/ai-adoption-in-the-enterprise-2020/) -1. [How Adopting MLOps can Help Companies With ML Culture?](https://www.analyticsinsight.net/adopting-mlops-can-help-companies-ml-culture/) -1. [Weaving AI into Your Organization](https://medium.com/firmai/weaving-ai-into-your-organization-2d9643da50e1) -1. [What to Do When AI Fails](https://www.oreilly.com/radar/what-to-do-when-ai-fails/) -1. [Introduction to Machine Learning Problem Framing](https://developers.google.com/machine-learning/problem-framing) -1. [Structured Approach for Identifying AI Use Cases](https://towardsdatascience.com/proven-structured-approach-for-identifying-ai-use-cases-b876d8d00e5) -1. [Book: "Machine Learning for Business" by Doug Hudgeon, Richard Nichol, O'reilly](https://learning.oreilly.com/library/view/machine-learning-for/9781617295836/) -1. [Why Commercial Artificial Intelligence Products Do Not Scale (FemTech)](https://www.presagen.com/why-commercial-artificial-intelligence-products-do-not-scale) -1. [Google Cloud’s AI Adoption Framework (White Paper)](https://services.google.com/fh/files/misc/ai_adoption_framework_whitepaper.pdf) -1. [Data Science Project Management](http://www.datascience-pm.com/) -1. [Book: "Competing in the Age of AI" by Marco Iansiti, Karim R. Lakhani. Harvard Business Review Press. 2020](https://learning.oreilly.com/library/view/competing-in-the/9781633697638/) -1. [The Three Questions about AI that Startups Need to Ask. The first is: Are you sure you need AI?](https://towardsdatascience.com/google-expert-tips-for-artificial-intelligence-startups-three-questions-about-ai-that-startups-need-to-ask-308924cb5324) -1. [Taming the Tail: Adventures in Improving AI Economics](https://a16z.com/2020/08/12/taming-the-tail-adventures-in-improving-ai-economics/) -1. [Managing the Risks of Adopting AI Engineering](https://insights.sei.cmu.edu/sei_blog/2020/08/managing-the-risks-of-adopting-ai-engineering.html) -1. [Get rid of AI Saviorism](https://www.shreya-shankar.com/ai-saviorism/) -1. [Collection of articles listing reasons why data science projects fail](https://github.com/xLaszlo/datascience-fails) -1. [How to Choose Your First AI Project by Andrew Ng](https://hbr.org/2019/02/how-to-choose-your-first-ai-project) -1. [How to Set AI Goals](https://www.oreilly.com/radar/how-to-set-ai-goals/) -1. [Expanding AI's Impact With Organizational Learning](https://sloanreview.mit.edu/projects/expanding-ais-impact-with-organizational-learning/) -1. [Potemkin Data Science](https://mcorrell.medium.com/potemkin-data-science-fba2b5ba5cc6) -1. [When Should You Not Invest in AI?](https://www.entrepreneur.com/article/359803) -1. [Why 90% of machine learning models never hit the market. Most companies lack leadership support, effective communication between teams, and accessible data](https://thenextweb.com/news/why-most-machine-learning-models-never-hit-market-syndication) -
- - - - -# Model Governance, Ethics, Responsible AI - -This topic is extracted into our new [Awesome ML Model Governace repository](https://github.com/visenger/Awesome-ML-Model-Governance) - - - -# MLOps: People & Processes -
-Click to expand! - -1. [Scaling An ML Team (0–10 People)](https://medium.com/aquarium-learning/scaling-an-ml-team-0-10-people-ae024f3a89f3) -1. [The Knowledge Repo project is focused on facilitating the sharing of knowledge between data scientists and other technical roles.](https://github.com/airbnb/knowledge-repo) -1. [Scaling Knowledge at Airbnb](https://medium.com/airbnb-engineering/scaling-knowledge-at-airbnb-875d73eff091) -1. [Models for integrating data science teams within companies A comparative analysis](https://djpardis.medium.com/models-for-integrating-data-science-teams-within-organizations-7c5afa032ebd) -1. [How to Write Better with The Why, What, How Framework. How to write design documents for data science/machine learning projects? (by Eugene Yan)](https://eugeneyan.com/writing/writing-docs-why-what-how/) -1. [Technical Writing Courses](https://developers.google.com/tech-writing) -1. [Building a data team at a mid-stage startup: a short story. By Erik Bernhardsson](https://erikbern.com/2021/07/07/the-data-team-a-short-story.html) -1. [The Cultural Benefits of Artificial Intelligence in the Enterprise. by Sam Ransbotham, François Candelon, David Kiron, Burt LaFountain, and Shervin Khodabandeh](https://web-assets.bcg.com/2a/d0/ebfb860a4e05aa9e4729b083da4b/the-cultural-benefits-of-artificial-intelligence-in-the-enterprise.pdf) -
- - - -# Newsletters About MLOps, Machine Learning, Data Science and Co. -
-Click to expand! - -1. [ML in Production newsletter](https://mlinproduction.com/machine-learning-newsletter/) -1. [MLOps.community](https://mlops.community/) -1. [Andriy Burkov newsletter](https://www.linkedin.com/pulse/artificial-intelligence-33-andriy-burkov/) -1. [Decision Intelligence by Cassie Kozyrkov](https://decision.substack.com/) -1. [Laszlo's Newsletter about Data Science](https://laszlo.substack.com/) -1. [Data Elixir newsletter for a weekly dose of the top data science picks from around the web. Covering machine learning, data visualization, analytics, and strategy.](https://dataelixir.com/) -1. [The Data Science Roundup by Tristan Handy](http://roundup.fishtownanalytics.com/) -1. [Vicki Boykis Newsletter about Data Science](https://vicki.substack.com/) -1. [KDnuggets News](https://www.kdnuggets.com/) -1. [Analytics Vidhya, Any questions on business analytics, data science, big data, data visualizations tools and techniques](https://www.analyticsvidhya.com/blog/) -1. [Data Science Weekly Newsletter: A free weekly newsletter featuring curated news, articles and jobs related to Data Science](https://www.datascienceweekly.org/) -1. [The Machine Learning Engineer Newsletter](https://ethical.institute/mle.html) -1. [Gradient Flow helps you stay ahead of the latest technology trends and tools with in-depth coverage, analysis and insights. See the latest on data, technology and business, with a focus on machine learning and AI](https://gradientflow.wpcomstaging.com/) -1. [Your guide to AI by Nathan Benaich. Monthly analysis of AI technology, geopolitics, research, and startups.](http://newsletter.airstreet.com/) -1. [O'Reilly Data & AI Newsletter](https://www.oreilly.com/emails/newsletters/) -1. [deeplearning.ai’s newsletter by Andrew Ng](https://www.deeplearning.ai/) -1. [Deep Learning Weekly](https://www.deeplearningweekly.com/) -1. [Import AI is a weekly newsletter about artificial intelligence, read by more than ten thousand experts. By Jack Clark.](https://jack-clark.net/) -1. [AI Ethics Weekly](https://lighthouse3.com/newsletter/) -1. [Announcing Projects To Know, a weekly machine intelligence and data science newsletter](https://blog.amplifypartners.com/announcing-projects-to-know/) -1. [TWIML: This Week in Machine Learning and AI newsletter](https://twimlai.com/newsletter/) -1. [featurestore.org: Monthly Newsletter on Feature Stores for ML](https://www.featurestore.org/) -1. [DataTalks.Club Community: Slack, Newsletter, Podcast, Weeekly Events](https://datatalks.club/) -1. [Machine Learning Ops Roundup](https://mlopsroundup.substack.com/) -1. [Data Science Programming Newsletter by Eric Ma](https://dspn.substack.com/) -1. [Marginally Interesting by Mikio L. Braun](https://www.getrevue.co/profile/mikiobraun) -1. [Synced](https://syncedreview.com/) -1. [The Ground Truth: Newsletter for Computer Vision Practitioners](https://info.superb-ai.com/ground-truth-newsletter-subscribe) -1. [SwirlAI: Data Engineering, MLOps and overall Data focused Newsletter by Aurimas Griciūnas](https://swirlai.substack.com/) -1. [Marvelous MLOps](https://marvelousmlops.substack.com) -1. [Made with ML](https://madewithml.com/misc/newsletter/) -1. [MLOps Insights Newsletter - 8 episodes covering topics like Model Feedback Vacuums, Deployment Reproducibility and Serverless in the context of MLOps](https://mlopsinsights.com/) -
- -[![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/B0B416E7UI) - +# Awesome MLOps [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome) + +A curated list of awesome MLOps tools. + +Inspired by [awesome-python](https://github.com/vinta/awesome-python). + +- [Awesome MLOps](#awesome-mlops) + - [AutoML](#automl) + - [CI/CD for Machine Learning](#cicd-for-machine-learning) + - [Cron Job Monitoring](#cron-job-monitoring) + - [Data Catalog](#data-catalog) + - [Data Enrichment](#data-enrichment) + - [Data Exploration](#data-exploration) + - [Data Management](#data-management) + - [Data Processing](#data-processing) + - [Data Validation](#data-validation) + - [Data Visualization](#data-visualization) + - [Drift Detection](#drift-detection) + - [Feature Engineering](#feature-engineering) + - [Feature Store](#feature-store) + - [Hyperparameter Tuning](#hyperparameter-tuning) + - [Knowledge Sharing](#knowledge-sharing) + - [Machine Learning Platform](#machine-learning-platform) + - [Model Fairness and Privacy](#model-fairness-and-privacy) + - [Model Interpretability](#model-interpretability) + - [Model Lifecycle](#model-lifecycle) + - [Model Serving](#model-serving) + - [Model Testing & Validation](#model-testing--validation) + - [Optimization Tools](#optimization-tools) + - [Simplification Tools](#simplification-tools) + - [Visual Analysis and Debugging](#visual-analysis-and-debugging) + - [Workflow Tools](#workflow-tools) +- [Resources](#resources) + - [Articles](#articles) + - [Books](#books) + - [Events](#events) + - [Other Lists](#other-lists) + - [Podcasts](#podcasts) + - [Slack](#slack) + - [Websites](#websites) +- [Contributing](#contributing) + +--- + +## AutoML + +*Tools for performing AutoML.* + +* [AutoGluon](https://github.com/awslabs/autogluon) - Automated machine learning for image, text, tabular, time-series, and multi-modal data. +* [AutoKeras](https://github.com/keras-team/autokeras) - AutoKeras goal is to make machine learning accessible for everyone. +* [AutoPyTorch](https://github.com/automl/Auto-PyTorch) - Automatic architecture search and hyperparameter optimization for PyTorch. +* [AutoSKLearn](https://github.com/automl/auto-sklearn) - Automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. +* [EvalML](https://github.com/alteryx/evalml) - A library that builds, optimizes, and evaluates ML pipelines using domain-specific functions. +* [FLAML](https://github.com/microsoft/FLAML) - Finds accurate ML models automatically, efficiently and economically. +* [H2O AutoML](https://h2o.ai/platform/h2o-automl) - Automates ML workflow, which includes automatic training and tuning of models. +* [MindsDB](https://github.com/mindsdb/mindsdb) - AI layer for databases that allows you to effortlessly develop, train and deploy ML models. +* [MLBox](https://github.com/AxeldeRomblay/MLBox) - MLBox is a powerful Automated Machine Learning python library. +* [Model Search](https://github.com/google/model_search) - Framework that implements AutoML algorithms for model architecture search at scale. +* [NNI](https://github.com/microsoft/nni) - An open source AutoML toolkit for automate machine learning lifecycle. + +## CI/CD for Machine Learning + +*Tools for performing CI/CD for Machine Learning.* + +* [ClearML](https://github.com/allegroai/clearml) - Auto-Magical CI/CD to streamline your ML workflow. +* [CML](https://github.com/iterative/cml) - Open-source library for implementing CI/CD in machine learning projects. +* [KitOps](https://github.com/jozu-ai/kitops) – Open source MLOps project that eases model handoffs between data scientist and DevOps. + +## Cron Job Monitoring + +*Tools for monitoring cron jobs (recurring jobs).* + +* [Cronitor](https://cronitor.io/cron-job-monitoring) - Monitor any cron job or scheduled task. +* [HealthchecksIO](https://healthchecks.io/) - Simple and effective cron job monitoring. +* [Heartbeat.pm](https://heartbeat.pm) - Monitoring aliveness of any sensor/cron job. + +## Data Catalog + +*Tools for data cataloging.* + +* [Amundsen](https://www.amundsen.io/) - Data discovery and metadata engine for improving productivity when interacting with data. +* [Apache Atlas](https://atlas.apache.org) - Provides open metadata management and governance capabilities to build a data catalog. +* [CKAN](https://github.com/ckan/ckan) - Open-source DMS (data management system) for powering data hubs and data portals. +* [DataHub](https://github.com/linkedin/datahub) - LinkedIn's generalized metadata search & discovery tool. +* [Magda](https://github.com/magda-io/magda) - A federated, open-source data catalog for all your big data and small data. +* [Metacat](https://github.com/Netflix/metacat) - Unified metadata exploration API service for Hive, RDS, Teradata, Redshift, S3 and Cassandra. +* [OpenMetadata](https://open-metadata.org/) - A Single place to discover, collaborate and get your data right. + +## Data Enrichment + +*Tools and libraries for data enrichment.* + +* [Snorkel](https://github.com/snorkel-team/snorkel) - A system for quickly generating training data with weak supervision. +* [Upgini](https://github.com/upgini/upgini) - Enriches training datasets with features from public and community shared data sources. + +## Data Exploration + +*Tools for performing data exploration.* + +* [Apache Zeppelin](https://zeppelin.apache.org/) - Enables data-driven, interactive data analytics and collaborative documents. +* [BambooLib](https://github.com/tkrabel/bamboolib) - An intuitive GUI for Pandas DataFrames. +* [DataPrep](https://github.com/sfu-db/dataprep) - Collect, clean and visualize your data in Python. +* [Deepnote](https://github.com/deepnote/deepnote) - Drop-in replacement for Jupyter and an AI-native workspace for modern data teams. +* [Google Colab](https://colab.research.google.com) - Hosted Jupyter notebook service that requires no setup to use. +* [Jupyter Notebook](https://jupyter.org/) - Web-based notebook environment for interactive computing. +* [JupyterLab](https://jupyterlab.readthedocs.io) - The next-generation user interface for Project Jupyter. +* [Jupytext](https://github.com/mwouts/jupytext) - Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts. +* [Pandas Profiling](https://github.com/ydataai/pandas-profiling) - Create HTML profiling reports from pandas DataFrame objects. +* [Polynote](https://polynote.org/) - The polyglot notebook with first-class Scala support. + +## Data Management + +*Tools for performing data management.* + +* [Arrikto](https://www.arrikto.com/) - Dead simple, ultra fast storage for the hybrid Kubernetes world. +* [BlazingSQL](https://github.com/BlazingDB/blazingsql) - A lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF. +* [Delta Lake](https://github.com/delta-io/delta) - Storage layer that brings scalable, ACID transactions to Apache Spark and other engines. +* [Dolt](https://github.com/dolthub/dolt) - SQL database that you can fork, clone, branch, merge, push and pull just like a git repository. +* [Dud](https://github.com/kevin-hanselman/dud) - A lightweight CLI tool for versioning data alongside source code and building data pipelines. +* [DVC](https://dvc.org/) - Management and versioning of datasets and machine learning models. +* [Git LFS](https://git-lfs.github.com) - An open source Git extension for versioning large files. +* [Hub](https://github.com/activeloopai/Hub) - A dataset format for creating, storing, and collaborating on AI datasets of any size. +* [Intake](https://github.com/intake/intake) - A lightweight set of tools for loading and sharing data in data science projects. +* [lakeFS](https://github.com/treeverse/lakeFS) - Repeatable, atomic and versioned data lake on top of object storage. +* [Marquez](https://github.com/MarquezProject/marquez) - Collect, aggregate, and visualize a data ecosystem's metadata. +* [Milvus](https://github.com/milvus-io/milvus/) - An open source embedding vector similarity search engine powered by Faiss, NMSLIB and Annoy. +* [Pinecone](https://www.pinecone.io) - Managed and distributed vector similarity search used with a lightweight SDK. +* [Potato](https://github.com/davidjurgens/potato) - Portable annotation tool for creating labeled datasets. +* [Qdrant](https://github.com/qdrant/qdrant) - An open source vector similarity search engine with extended filtering support. +* [Quilt](https://github.com/quiltdata/quilt) - A self-organizing data hub with S3 support. + +## Data Processing + +*Tools related to data processing and data pipelines.* + +* [Airflow](https://airflow.apache.org/) - Platform to programmatically author, schedule, and monitor workflows. +* [Azkaban](https://github.com/azkaban/azkaban) - Batch workflow job scheduler created at LinkedIn to run Hadoop jobs. +* [Dagster](https://github.com/dagster-io/dagster) - A data orchestrator for machine learning, analytics, and ETL. +* [Hadoop](https://hadoop.apache.org/) - Framework that allows for the distributed processing of large data sets across clusters. +* [OpenRefine](https://github.com/OpenRefine/OpenRefine) - Power tool for working with messy data and improving it. +* [Spark](https://spark.apache.org/) - Unified analytics engine for large-scale data processing. + +## Data Validation + +*Tools related to data validation.* + +* [Cerberus](https://github.com/pyeve/cerberus) - Lightweight, extensible data validation library for Python. +* [Cleanlab](https://github.com/cleanlab/cleanlab) - Python library for data-centric AI and machine learning with messy, real-world data and labels. +* [Great Expectations](https://greatexpectations.io) - A Python data validation framework that allows to test your data against datasets. +* [JSON Schema](https://json-schema.org/) - A vocabulary that allows you to annotate and validate JSON documents. +* [TFDV](https://github.com/tensorflow/data-validation) - An library for exploring and validating machine learning data. + +## Data Visualization + +*Tools for data visualization, reports and dashboards.* + +* [Count](https://count.co) - SQL/drag-and-drop querying and visualisation tool based on notebooks. +* [Dash](https://github.com/plotly/dash) - Analytical Web Apps for Python, R, Julia, and Jupyter. +* [Data Studio](https://datastudio.google.com) - Reporting solution for power users who want to go beyond the data and dashboards of GA. +* [Facets](https://github.com/PAIR-code/facets) - Visualizations for understanding and analyzing machine learning datasets. +* [Grafana](https://grafana.com/grafana/) - Multi-platform open source analytics and interactive visualization web application. +* [Lux](https://github.com/lux-org/lux) - Fast and easy data exploration by automating the visualization and data analysis process. +* [Metabase](https://www.metabase.com/) - The simplest, fastest way to get business intelligence and analytics to everyone. +* [Redash](https://redash.io/) - Connect to any data source, easily visualize, dashboard and share your data. +* [SolidUI](https://github.com/CloudOrc/SolidUI) - AI-generated visualization prototyping and editing platform, support 2D and 3D models. +* [Superset](https://superset.incubator.apache.org/) - Modern, enterprise-ready business intelligence web application. +* [Tableau](https://www.tableau.com) - Powerful and fastest growing data visualization tool used in the business intelligence industry. + +## Drift Detection + +*Tools and libraries related to drift detection.* + +* [Alibi Detect](https://github.com/SeldonIO/alibi-detect) - An open source Python library focused on outlier, adversarial and drift detection. +* [Frouros](https://github.com/IFCA/frouros) - An open source Python library for drift detection in machine learning systems. +* [ml3-drift](https://github.com/ml-cube/ml3-drift) - Drift detection algorithms seamlessly integrated with ML and AI frameworks. +* [TorchDrift](https://github.com/torchdrift/torchdrift/) - A data and concept drift library for PyTorch. + +## Feature Engineering + +*Tools and libraries related to feature engineering.* + +* [Feature Engine](https://github.com/feature-engine/feature_engine) - Feature engineering package with SKlearn like functionality. +* [Featuretools](https://github.com/alteryx/featuretools) - Python library for automated feature engineering. +* [TSFresh](https://github.com/blue-yonder/tsfresh) - Python library for automatic extraction of relevant features from time series. + +## Feature Store + +*Feature store tools for data serving.* + +* [Butterfree](https://github.com/quintoandar/butterfree) - A tool for building feature stores. Transform your raw data into beautiful features. +* [ByteHub](https://github.com/bytehub-ai/bytehub) - An easy-to-use feature store. Optimized for time-series data. +* [Feast](https://feast.dev/) - End-to-end open source feature store for machine learning. +* [Feathr](https://github.com/linkedin/feathr) - An enterprise-grade, high performance feature store. +* [Featureform](https://github.com/featureform/featureform) - A Virtual Feature Store. Turn your existing data infrastructure into a feature store. +* [Tecton](https://www.tecton.ai/) - A fully-managed feature platform built to orchestrate the complete lifecycle of features. + +## Hyperparameter Tuning + +*Tools and libraries to perform hyperparameter tuning.* + +* [Advisor](https://github.com/tobegit3hub/advisor) - Open-source implementation of Google Vizier for hyper parameters tuning. +* [Hyperas](https://github.com/maxpumperla/hyperas) - A very simple wrapper for convenient hyperparameter optimization. +* [Hyperopt](https://github.com/hyperopt/hyperopt) - Distributed Asynchronous Hyperparameter Optimization in Python. +* [Katib](https://github.com/kubeflow/katib) - Kubernetes-based system for hyperparameter tuning and neural architecture search. +* [KerasTuner](https://github.com/keras-team/keras-tuner) - Easy-to-use, scalable hyperparameter optimization framework. +* [Optuna](https://optuna.org/) - Open source hyperparameter optimization framework to automate hyperparameter search. +* [Scikit Optimize](https://github.com/scikit-optimize/scikit-optimize) - Simple and efficient library to minimize expensive and noisy black-box functions. +* [Talos](https://github.com/autonomio/talos) - Hyperparameter Optimization for TensorFlow, Keras and PyTorch. +* [Tune](https://docs.ray.io/en/latest/tune.html) - Python library for experiment execution and hyperparameter tuning at any scale. + +## Knowledge Sharing + +*Tools for sharing knowledge to the entire team/company.* + +* [Knowledge Repo](https://github.com/airbnb/knowledge-repo) - Knowledge sharing platform for data scientists and other technical professions. +* [Kyso](https://kyso.io/) - One place for data insights so your entire team can learn from your data. + +## Machine Learning Platform + +*Complete machine learning platform solutions.* + +* [aiWARE](https://www.veritone.com/aiware/aiware-os/) - aiWARE helps MLOps teams evaluate, deploy, integrate, scale & monitor ML models. +* [Algorithmia](https://algorithmia.com/) - Securely govern your machine learning operations with a healthy ML lifecycle. +* [Allegro AI](https://allegro.ai/) - Transform ML/DL research into products. Faster. +* [Bodywork](https://bodywork.readthedocs.io/en/latest/) - Deploys machine learning projects developed in Python, to Kubernetes. +* [CNVRG](https://cnvrg.io/) - An end-to-end machine learning platform to build and deploy AI models at scale. +* [DAGsHub](https://dagshub.com/) - A platform built on open source tools for data, model and pipeline management. +* [Dataiku](https://www.dataiku.com/) - Platform democratizing access to data and enabling enterprises to build their own path to AI. +* [DataRobot](https://www.datarobot.com/) - AI platform that democratizes data science and automates the end-to-end ML at scale. +* [Domino](https://www.dominodatalab.com/) - One place for your data science tools, apps, results, models, and knowledge. +* [Edge Impulse](https://edgeimpulse.com/) - Platform for creating, optimizing, and deploying AI/ML algorithms for edge devices. +* [envd](https://github.com/tensorchord/envd) - Machine learning development environment for data science and AI/ML engineering teams. +* [FedML](https://fedml.ai/) - Simplifies the workflow of federated learning anywhere at any scale. +* [Gradient](https://gradient.paperspace.com/) - Multicloud CI/CD and MLOps platform for machine learning teams. +* [gpulse](https://gpulse.ai) - GPU monitoring TUI with predictive OOM detection for ML training runs. +* [H2O](https://www.h2o.ai/) - Open source leader in AI with a mission to democratize AI for everyone. +* [Hopsworks](https://www.hopsworks.ai/) - Open-source platform for developing and operating machine learning models at scale. +* [Iguazio](https://www.iguazio.com/) - Data science platform that automates MLOps with end-to-end machine learning pipelines. +* [Katonic](https://katonic.ai/) - Automate your cycle of intelligence with Katonic MLOps Platform. +* [Knime](https://www.knime.com/) - Create and productionize data science using one easy and intuitive environment. +* [Kubeflow](https://www.kubeflow.org/) - Making deployments of ML workflows on Kubernetes simple, portable and scalable. +* [LynxKite](https://lynxkite.com/) - A complete graph data science platform for very large graphs and other datasets. +* [ML Workspace](https://github.com/ml-tooling/ml-workspace) - All-in-one web-based IDE specialized for machine learning and data science. +* [MLReef](https://github.com/MLReef/mlreef) - Open source MLOps platform that helps you collaborate, reproduce and share your ML work. +* [Modzy](https://www.modzy.com/) - Deploy, connect, run, and monitor machine learning (ML) models in the enterprise and at the edge. +* [Neu.ro](https://neu.ro) - MLOps platform that integrates open-source and proprietary tools into client-oriented systems. +* [Neurolink](https://github.com/juspay/neurolink) - TypeScript-first multi-provider AI agent framework with workflow orchestration and MCP support. +* [Omnimizer](https://www.omniml.ai) - Simplifies and accelerates MLOps by bridging the gap between ML models and edge hardware. +* [Pachyderm](https://www.pachyderm.com/) - Combines data lineage with end-to-end pipelines on Kubernetes, engineered for the enterprise. +* [Polyaxon](https://www.github.com/polyaxon/polyaxon/) - A platform for reproducible and scalable machine learning and deep learning on kubernetes. +* [Sagemaker](https://aws.amazon.com/sagemaker/) - Fully managed service that provides the ability to build, train, and deploy ML models quickly. +* [SAS Viya](https://www.sas.com/en_us/software/viya.html) - Cloud native AI, analytic and data management platform that supports the analytics life cycle. +* [Sematic](https://sematic.dev) - An open-source end-to-end pipelining tool to go from laptop prototype to cloud in no time. +* [SigOpt](https://sigopt.com/) - A platform that makes it easy to track runs, visualize training, and scale hyperparameter tuning. +* [TrueFoundry](https://www.truefoundry.com) - A Cloud-native MLOps Platform over Kubernetes to simplify training and serving of ML Models. +* [Valohai](https://valohai.com/) - MLOps platform for reproducible ML and LLM workflows from experimentation to production. + +## Model Fairness and Privacy + +*Tools for performing model fairness and privacy in production.* + +* [AIF360](https://github.com/Trusted-AI/AIF360) - A comprehensive set of fairness metrics for datasets and machine learning models. +* [Fairlearn](https://github.com/fairlearn/fairlearn) - A Python package to assess and improve fairness of machine learning models. +* [Opacus](https://github.com/pytorch/opacus) - A library that enables training PyTorch models with differential privacy. +* [TensorFlow Privacy](https://github.com/tensorflow/privacy) - Library for training machine learning models with privacy for training data. + +## Model Interpretability + +*Tools for performing model interpretability/explainability.* + +* [Alibi](https://github.com/SeldonIO/alibi) - Open-source Python library enabling ML model inspection and interpretation. +* [Captum](https://github.com/pytorch/captum) - Model interpretability and understanding library for PyTorch. +* [ELI5](https://github.com/eli5-org/eli5) - Python package which helps to debug machine learning classifiers and explain their predictions. +* [InterpretML](https://github.com/interpretml/interpret) - A toolkit to help understand models and enable responsible machine learning. +* [LIME](https://github.com/marcotcr/lime) - Explaining the predictions of any machine learning classifier. +* [Lucid](https://github.com/tensorflow/lucid) - Collection of infrastructure and tools for research in neural network interpretability. +* [SAGE](https://github.com/iancovert/sage) - For calculating global feature importance using Shapley values. +* [SHAP](https://github.com/slundberg/shap) - A game theoretic approach to explain the output of any machine learning model. + +## Model Lifecycle + +*Tools for managing model lifecycle (tracking experiments, parameters and metrics).* + +* [Aeromancy](https://github.com/quant-aq/aeromancy) - A framework for performing reproducible AI and ML for Weights and Biases. +* [Aim](https://github.com/aimhubio/aim) - A super-easy way to record, search and compare 1000s of ML training runs. +* [Cascade](https://github.com/Oxid15/cascade) - Library of ML-Engineering tools for rapid prototyping and experiment management. +* [Comet](https://github.com/comet-ml/comet-examples) - Track your datasets, code changes, experimentation history, and models. +* [Guild AI](https://guild.ai/) - Open source experiment tracking, pipeline automation, and hyperparameter tuning. +* [Keepsake](https://github.com/replicate/keepsake) - Version control for machine learning with support to Amazon S3 and Google Cloud Storage. +* [Losswise](https://losswise.com) - Makes it easy to track the progress of a machine learning project. +* [MLflow](https://mlflow.org/) - Open source platform for the machine learning lifecycle. +* [ModelDB](https://github.com/VertaAI/modeldb/) - Open source ML model versioning, metadata, and experiment management. +* [Neptune AI](https://neptune.ai/) - The most lightweight experiment management tool that fits any workflow. +* [Sacred](https://github.com/IDSIA/sacred) - A tool to help you configure, organize, log and reproduce experiments. +* [Weights and Biases](https://github.com/wandb/client) - A tool for visualizing and tracking your machine learning experiments. + +## Model Serving + +*Tools for serving models in production.* + +* [Banana](https://banana.dev) - Host your ML inference code on serverless GPUs and integrate it into your app with one line of code. +* [Beam](https://beam.cloud) - Develop on serverless GPUs, deploy highly performant APIs, and rapidly prototype ML models. +* [BentoML](https://github.com/bentoml/BentoML) - Open-source platform for high-performance ML model serving. +* [BudgetML](https://github.com/ebhy/budgetml) - Deploy a ML inference service on a budget in less than 10 lines of code. +* [Cog](https://github.com/replicate/cog) - Open-source tool that lets you package ML models in a standard, production-ready container. +* [Cortex](https://www.cortex.dev/) - Machine learning model serving infrastructure. +* [Geniusrise](https://docs.geniusrise.ai) - Host inference APIs, bulk inference and fine tune text, vision, audio and multi-modal models. +* [Gradio](https://github.com/gradio-app/gradio) - Create customizable UI components around your models. +* [GraphPipe](https://oracle.github.io/graphpipe) - Machine learning model deployment made simple. +* [Hydrosphere](https://github.com/Hydrospheredata/hydro-serving) - Platform for deploying your Machine Learning to production. +* [KFServing](https://github.com/kubeflow/kfserving) - Kubernetes custom resource definition for serving ML models on arbitrary frameworks. +* [LocalAI](https://github.com/mudler/LocalAI) - Drop-in replacement REST API that’s compatible with OpenAI API specifications for inferencing. +* [Merlin](https://github.com/gojek/merlin) - A platform for deploying and serving machine learning models. +* [MLEM](https://github.com/iterative/mlem) - Version and deploy your ML models following GitOps principles. +* [Opyrator](https://github.com/ml-tooling/opyrator) - Turns your ML code into microservices with web API, interactive GUI, and more. +* [PredictionIO](https://github.com/apache/predictionio) - Event collection, deployment of algorithms, evaluation, querying predictive results via APIs. +* [Quix](https://quix.io) - Serverless platform for processing data streams in real-time with machine learning models. +* [Rune](https://github.com/hotg-ai/rune) - Provides containers to encapsulate and deploy EdgeML pipelines and applications. +* [Seldon](https://www.seldon.io/) - Take your ML projects from POC to production with maximum efficiency and minimal risk. +* [Streamlit](https://github.com/streamlit/streamlit) - Lets you create apps for your ML projects with deceptively simple Python scripts. +* [TensorFlow Serving](https://www.tensorflow.org/tfx/guide/serving) - Flexible, high-performance serving system for ML models, designed for production. +* [TorchServe](https://github.com/pytorch/serve) - A flexible and easy to use tool for serving PyTorch models. +* [Triton Inference Server](https://github.com/triton-inference-server/server) - Provides an optimized cloud and edge inferencing solution. +* [Vespa](https://github.com/vespa-engine/vespa) - Store, search, organize and make machine-learned inferences over big data at serving time. +* [Wallaroo.AI](https://wallaroo.ai/) - A platform for deploying, serving, and optimizing ML models in both cloud and edge environments. + +## Model Testing & Validation + +*Tools for testing and validating models.* + +* [Deepchecks](https://github.com/deepchecks/deepchecks) - Open-source package for validating ML models & data, with various checks and suites. +* [Starwhale](https://github.com/star-whale/starwhale) - An MLOps/LLMOps platform for model building, evaluation, and fine-tuning. +* [Trubrics](https://github.com/trubrics/trubrics-sdk) - Validate machine learning with data science and domain expert feedback. + +## Optimization Tools + +*Optimization tools related to model scalability in production.* + +* [Accelerate](https://github.com/huggingface/accelerate) - A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision. +* [Dask](https://dask.org/) - Provides advanced parallelism for analytics, enabling performance at scale for the tools you love. +* [DeepSpeed](https://github.com/microsoft/DeepSpeed) - Deep learning optimization library that makes distributed training easy, efficient, and effective. +* [Fiber](https://uber.github.io/fiber/) - Python distributed computing library for modern computer clusters. +* [Horovod](https://github.com/horovod/horovod) - Distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. +* [Mahout](https://mahout.apache.org/) - Distributed linear algebra framework and mathematically expressive Scala DSL. +* [MLlib](https://spark.apache.org/mllib/) - Apache Spark's scalable machine learning library. +* [Modin](https://github.com/modin-project/modin) - Speed up your Pandas workflows by changing a single line of code. +* [Nebullvm](https://github.com/nebuly-ai/nebullvm) - Easy-to-use library to boost AI inference. +* [Nos](https://github.com/nebuly-ai/nos) - Open-source module for running AI workloads on Kubernetes in an optimized way. +* [Petastorm](https://github.com/uber/petastorm) - Enables single machine or distributed training and evaluation of deep learning models. +* [Rapids](https://rapids.ai/index.html) - Gives the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. +* [Ray](https://github.com/ray-project/ray) - Fast and simple framework for building and running distributed applications. +* [Singa](http://singa.apache.org/en/index.html) - Apache top level project, focusing on distributed training of DL and ML models. +* [Tpot](https://github.com/EpistasisLab/tpot) - Automated ML tool that optimizes machine learning pipelines using genetic programming. + +## Simplification Tools + +*Tools related to machine learning simplification and standardization.* + +* [Chassis](https://chassisml.io) - Turns models into ML-friendly containers that run just about anywhere. +* [Hermione](https://github.com/a3data/hermione) - Help Data Scientists on setting up more organized codes, in a quicker and simpler way. +* [Hydra](https://github.com/facebookresearch/hydra) - A framework for elegantly configuring complex applications. +* [Koalas](https://github.com/databricks/koalas) - Pandas API on Apache Spark. Makes data scientists more productive when interacting with big data. +* [Ludwig](https://github.com/uber/ludwig) - Allows users to train and test deep learning models without the need to write code. +* [MLNotify](https://github.com/aporia-ai/mlnotify) - No need to keep checking your training, just one import line and you'll know the second it's done. +* [PyCaret](https://pycaret.org/) - Open source, low-code machine learning library in Python. +* [Sagify](https://github.com/Kenza-AI/sagify) - A CLI utility to train and deploy ML/DL models on AWS SageMaker. +* [Soopervisor](https://github.com/ploomber/soopervisor) - Export ML projects to Kubernetes (Argo workflows), Airflow, AWS Batch, and SLURM. +* [Soorgeon](https://github.com/ploomber/soorgeon) - Convert monolithic Jupyter notebooks into maintainable pipelines. +* [TrainGenerator](https://github.com/jrieke/traingenerator) - A web app to generate template code for machine learning. +* [Turi Create](https://github.com/apple/turicreate) - Simplifies the development of custom machine learning models. + +## Visual Analysis and Debugging + +*Tools for performing visual analysis and debugging of ML/DL models.* + +* [Aporia](https://www.aporia.com/) - Observability with customized monitoring and explainability for ML models. +* [Arize](https://www.arize.com/) - A free end-to-end ML observability and model monitoring platform. +* [Evidently](https://github.com/evidentlyai/evidently) - Interactive reports to analyze ML models during validation or production monitoring. +* [Fiddler](https://www.fiddler.ai/) - Monitor, explain, and analyze your AI in production. +* [Manifest](https://github.com/mnfst/manifest) - Open-source real-time cost observability for AI agents. +* [Manifold](https://github.com/uber/manifold) - A model-agnostic visual debugging tool for machine learning. +* [NannyML](https://github.com/NannyML/nannyml) - Algorithm capable of fully capturing the impact of data drift on performance. +* [Netron](https://github.com/lutzroeder/netron) - Visualizer for neural network, deep learning, and machine learning models. +* [Opik](https://github.com/comet-ml/opik) - Evaluate, test, and ship LLM applications with a suite of observability tools. +* [Phoenix](https://phoenix.arize.com) - MLOps in a Notebook for troubleshooting and fine-tuning generative LLM, CV, and tabular models. +* [Radicalbit](https://github.com/radicalbit/radicalbit-ai-monitoring/) - The open source solution for monitoring your AI models in production. +* [Rhesis](https://github.com/rhesis-ai/rhesis) - Testing infrastructure for LLM and agentic applications with collaborative evaluation. +* [Superwise](https://www.superwise.ai) - Fully automated, enterprise-grade model observability in a self-service SaaS platform. +* [Whylogs](https://github.com/whylabs/whylogs) - The open source standard for data logging. Enables ML monitoring and observability. +* [Yellowbrick](https://github.com/DistrictDataLabs/yellowbrick) - Visual analysis and diagnostic tools to facilitate machine learning model selection. + +## Workflow Tools + +*Tools and frameworks to create workflows or pipelines in the machine learning context.* + +* [Argo](https://github.com/argoproj/argo) - Open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. +* [Automate Studio](https://www.veritone.com/applications/automate-studio/) - Rapidly build & deploy AI-powered workflows. +* [Cordum](https://github.com/cordum-io/cordum) - Governance-first control plane for AI agents and external workers. +* [Couler](https://github.com/couler-proj/couler) - Unified interface for constructing and managing workflows on different workflow engines. +* [dstack](https://github.com/dstackai/dstack) - An open-core tool to automate data and training workflows. +* [Flyte](https://flyte.org/) - Easy to create concurrent, scalable, and maintainable workflows for machine learning. +* [Hamilton](https://github.com/dagworks-inc/hamilton) - A scalable general purpose micro-framework for defining dataflows. +* [Kale](https://github.com/kubeflow-kale/kale) - Aims at simplifying the Data Science experience of deploying Kubeflow Pipelines workflows. +* [Kedro](https://github.com/quantumblacklabs/kedro) - Library that implements software engineering best-practice for data and ML pipelines. +* [Luigi](https://github.com/spotify/luigi) - Python module that helps you build complex pipelines of batch jobs. +* [Metaflow](https://metaflow.org/) - Human-friendly lib that helps scientists and engineers build and manage data science projects. +* [MLRun](https://github.com/mlrun/mlrun) - Generic mechanism for data scientists to build, run, and monitor ML tasks and pipelines. +* [Orchest](https://github.com/orchest/orchest/) - Visual pipeline editor and workflow orchestrator with an easy to use UI and based on Kubernetes. +* [Ploomber](https://github.com/ploomber/ploomber) - Write maintainable, production-ready pipelines. Develop locally, deploy to the cloud. +* [Prefect](https://docs.prefect.io/) - A workflow management system, designed for modern infrastructure. +* [VDP](https://github.com/instill-ai/vdp) - An open-source tool to seamlessly integrate AI for unstructured data into the modern data stack. +* [Velda](https://velda.io) - Run jobs and workflows as if on your local machine. +* [Wordware](https://www.wordware.ai) - A web-hosted IDE where non-technical domain experts can build task-specific AI agents. +* [ZenML](https://github.com/maiot-io/zenml) - An extensible open-source MLOps framework to create reproducible pipelines. + +--- + +# Resources + +Where to discover new tools and discuss about existing ones. + +## Articles + +* [Continuous Delivery for Machine Learning](https://martinfowler.com/articles/cd4ml.html) (Martin Fowler) +* [Machine Learning Operations (MLOps): Overview, Definition, and Architecture](https://arxiv.org/abs/2205.02302) (arXiv) +* [MLOps Roadmap: A Complete MLOps Career Guide](https://www.scaler.com/blog/mlops-roadmap/) (Scaler Blogs) +* [MLOps: Continuous delivery and automation pipelines in machine learning](https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning) (Google) +* [MLOps: Machine Learning as an Engineering Discipline](https://towardsdatascience.com/ml-ops-machine-learning-as-an-engineering-discipline-b86ca4874a3f) (Medium) +* [Practitioners guide to MLOps: A framework for continuous delivery and automation of machine learning](https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf) (Google) +* [Rules of Machine Learning: Best Practices for ML Engineering](https://developers.google.com/machine-learning/guides/rules-of-ml) (Google) +* [The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/aad9f93b86b7addfea4c419b9100c6cdd26cacea.pdf) (Google) +* [What Is MLOps?](https://blogs.nvidia.com/blog/2020/09/03/what-is-mlops/) (NVIDIA) + +## Books + +* [AI Governance](https://www.manning.com/books/ai-governance) (Manning) +* [AI Model Evaluation](https://www.manning.com/books/ai-model-evaluation) (Manning) +* [Beginning MLOps with MLFlow](https://www.amazon.com/Beginning-MLOps-MLFlow-SageMaker-Microsoft/dp/1484265483) (Apress) +* [Building Machine Learning Pipelines](https://www.oreilly.com/library/view/building-machine-learning/9781492053187) (O'Reilly) +* [Building Machine Learning Powered Applications](https://www.oreilly.com/library/view/building-machine-learning/9781492045106) (O'Reilly) +* [Deep Learning in Production](https://www.amazon.com/gp/product/6180033773) (AI Summer) +* [Designing Machine Learning Systems](https://www.oreilly.com/library/view/designing-machine-learning/9781098107956) (O'Reilly) +* [Engineering MLOps](https://www.packtpub.com/product/engineering-mlops/9781800562882) (Packt) +* [Implementing MLOps in the Enterprise](https://www.oreilly.com/library/view/implementing-mlops-in/9781098136574) (O'Reilly) +* [Introducing MLOps](https://www.oreilly.com/library/view/introducing-mlops/9781492083283) (O'Reilly) +* [Kubeflow for Machine Learning](https://www.oreilly.com/library/view/kubeflow-for-machine/9781492050117) (O'Reilly) +* [Kubeflow Operations Guide](https://www.oreilly.com/library/view/kubeflow-operations-guide/9781492053262) (O'Reilly) +* [Machine Learning Design Patterns](https://www.oreilly.com/library/view/machine-learning-design/9781098115777) (O'Reilly) +* [Machine Learning Engineering in Action](https://www.manning.com/books/machine-learning-engineering-in-action) (Manning) +* [ML Ops: Operationalizing Data Science](https://www.oreilly.com/library/view/ml-ops-operationalizing/9781492074663) (O'Reilly) +* [MLOps Engineering at Scale](https://www.manning.com/books/mlops-engineering-at-scale) (Manning) +* [MLOps Lifecycle Toolkit](https://link.springer.com/book/10.1007/978-1-4842-9642-4) (Apress) +* [Practical Deep Learning at Scale with MLflow](https://www.packtpub.com/product/practical-deep-learning-at-scale-with-mlflow/9781803241333) (Packt) +* [Practical MLOps](https://www.oreilly.com/library/view/practical-mlops/9781098103002) (O'Reilly) +* [Production-Ready Applied Deep Learning](https://www.packtpub.com/product/production-ready-applied-deep-learning/9781803243665) (Packt) +* [Reliable Machine Learning](https://www.oreilly.com/library/view/reliable-machine-learning/9781098106218) (O'Reilly) +* [The Machine Learning Solutions Architect Handbook](https://www.packtpub.com/product/the-machine-learning-solutions-architect-handbook/9781801072168) (Packt) + +## Events + +* [AI Conference Deadline](https://aiconferenceddl.com/) +* [MLOps Conference - Keynotes and Panels](https://www.youtube.com/playlist?list=PLH8M0UOY0uy6d_n3vEQe6J_gRBUrISF9m) +* [MLOps World: Machine Learning in Production Conference](https://mlopsworld.com/) +* [NormConf - The Normcore Tech Conference](https://normconf.com/) +* [Stanford MLSys Seminar Series](https://mlsys.stanford.edu/) + +## Other Lists + +* [Applied ML](https://github.com/eugeneyan/applied-ml) +* [Awesome AutoML Papers](https://github.com/hibayesian/awesome-automl-papers) +* [Awesome AutoML](https://github.com/windmaple/awesome-AutoML) +* [Awesome Data Science](https://github.com/academic/awesome-datascience) +* [Awesome DataOps](https://github.com/kelvins/awesome-dataops) +* [Awesome Deep Learning](https://github.com/ChristosChristofidis/awesome-deep-learning) +* [Awesome Game Datasets](https://github.com/leomaurodesenv/game-datasets) (includes AI content) +* [Awesome Machine Learning](https://github.com/josephmisiti/awesome-machine-learning) +* [Awesome MLOps](https://github.com/visenger/awesome-mlops) +* [Awesome Production Machine Learning](https://github.com/EthicalML/awesome-production-machine-learning) +* [Awesome Python](https://github.com/vinta/awesome-python) +* [Deep Learning in Production](https://github.com/ahkarami/Deep-Learning-in-Production) + +## Podcasts + +* [AI Stories Podcast](https://www.youtube.com/@aistoriespodcast) +* [Kubernetes Podcast from Google](https://kubernetespodcast.com/) +* [Machine Learning – Software Engineering Daily](https://podcasts.google.com/?feed=aHR0cHM6Ly9zb2Z0d2FyZWVuZ2luZWVyaW5nZGFpbHkuY29tL2NhdGVnb3J5L21hY2hpbmUtbGVhcm5pbmcvZmVlZC8) +* [MLOps.community](https://podcasts.google.com/?feed=aHR0cHM6Ly9hbmNob3IuZm0vcy8xNzRjYjFiOC9wb2RjYXN0L3Jzcw) +* [Pipeline Conversation](https://podcast.zenml.io/) +* [Practical AI: Machine Learning, Data Science](https://changelog.com/practicalai) +* [This Week in Machine Learning & AI](https://twimlai.com/) +* [True ML Talks](https://www.youtube.com/playlist?list=PL4-eEhdXDO5F9Myvh41EeUh7oCgzqFRGk) + +## Slack + +* [Kubeflow Workspace](https://kubeflow.slack.com/#/) +* [MLOps Community Wokspace](https://mlops-community.slack.com) + +## Websites + +* [A guide to MLOps](https://mlops.swiss-ai-center.ch/) +* [Feature Stores for ML](http://featurestore.org/) +* [Made with ML](https://github.com/GokuMohandas/Made-With-ML) +* [ML-Ops](https://ml-ops.org/) +* [MLOps Community](https://mlops.community/) +* [MLOps Guide](https://mlops-guide.github.io/) +* [MLOps Now](https://mlopsnow.com) +* [System Designer - ML Systems](https://systemdesigner.net/ml-systems) + +# Contributing + +All contributions are welcome! Please take a look at the [contribution guidelines](https://github.com/kelvins/awesome-mlops/blob/main/CONTRIBUTING.md) first.