GitHub - HoareLea/data_science: A repository designed for learning/revising a wide range of data science topics ranging from simple to more advanced areas

Introduction

This repository is designed to serve as a structured resource for studying data science and machine learning concepts. In addition to theoretical explanations, it provides practical examples demonstrating how these concepts can be implemented in real-world scenarios. It is not intended to be an exhaustive catalogue of every topic, nor does it aim to provide full mathematical rigour in all areas. Instead, it focuses on covering a carefully selected set of topics with sufficient depth and rigour to support practical implementation of data science in real-world contexts. This isn't a theoretical textbook, it's a practical playbook.

01 - Probability & Statistics:
Fundamentals of probability and statistics including distributions, estimation, statistics, p-values and hypothesis testing
02 - Classical Machine Learning:
Core concepts in classical machine learning such as cross-validation, the bias–variance trade-off, and gradient descent, along with detailed explanations of many common machine learning algorithms
03 - Deep Learning:
A detailed explanation of different neural network architectures and their applications, including feedforward and convolutional neural networks
04 - Optimisation:
A detailed explanation of different neural network architectures and their applications, including feedforward and convolutional neural networks
05 - Software Engineering:
Programming fundamentals as well as more advanced topics such as concurrency, testing, and software design patterns
06 - Machine Learning Operations (MLOps):
Concepts and practices for deploying, monitoring, versioning, and maintaining machine learning systems in production environments

How to Use

The repository is organised into chapters, each focusing on a different area of data science and machine learning. Within each chapter, you will find multiple markdown files that group related topics into a single document. Each document contains a combination of explanations, code examples, and diagrams to support understanding.

There is no strict order in which you must work through the repository. However, many of the later chapters build upon concepts introduced earlier. For that reason, you will likely benefit from progressing through the chapters sequentially.

AI has changed how we access knowledge, but it has not changed the importance of learning. It is now possible to simulate competency without developing real understanding — but real competence still matters. The ability to connect ideas, reason independently, and apply concepts beyond the obvious will distinguish those who deliver meaningful value from those who simply reproduce information.

Data science and machine learning span mathematics, statistics, programming, software engineering, and more. You will not master all of these quickly — and you should not try to. The goal is to develop deep mastery of individual concepts rather than surface-level familiarity. To do so, you should narrow your focus, choosing a single specific concept to work on at a time, ensuring you properly understand it before moving on. That way you will be differentiated as someone who can delvier real value rather than just someone who can regurgitate an LLM output.

Before moving on, ask yourself:

Can I explain this topic to both a non-technical and a technical audience?
Can I give concrete examples of the concept?
Do I understand its limitations and assumptions?
Can I apply it in code without relying on external assistance? (Note external assistance is fine for syntax/function names but not "How do I apply this concept?" style help)

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github		.github
.idea		.idea
.vscode		.vscode
01_probability_&_statistics		01_probability_&_statistics
02_classical_machine_learning		02_classical_machine_learning
03_deep_learning		03_deep_learning
04_optimisation		04_optimisation
05_software_engineering		05_software_engineering
06_ml_ops		06_ml_ops
Algorithm Examples		Algorithm Examples
Python_Data_Science_Packages		Python_Data_Science_Packages
images		images
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Contents

How to Use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction

Contents

How to Use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages