Skip to content

HoareLea/data_science

Repository files navigation

Introduction

This repository is designed to serve as a structured resource for studying data science and machine learning concepts. In addition to theoretical explanations, it provides practical examples demonstrating how these concepts can be implemented in real-world scenarios. It is not intended to be an exhaustive catalogue of every topic, nor does it aim to provide full mathematical rigour in all areas. Instead, it focuses on covering a carefully selected set of topics with sufficient depth and rigour to support practical implementation of data science in real-world contexts. This isn't a theoretical textbook, it's a practical playbook.

Contents

  • 01 - Probability & Statistics:
    Fundamentals of probability and statistics including distributions, estimation, statistics, p-values and hypothesis testing

  • 02 - Classical Machine Learning:
    Core concepts in classical machine learning such as cross-validation, the bias–variance trade-off, and gradient descent, along with detailed explanations of many common machine learning algorithms

  • 03 - Deep Learning:
    A detailed explanation of different neural network architectures and their applications, including feedforward and convolutional neural networks

  • 04 - Optimisation:
    A detailed explanation of different neural network architectures and their applications, including feedforward and convolutional neural networks

  • 05 - Software Engineering:
    Programming fundamentals as well as more advanced topics such as concurrency, testing, and software design patterns

  • 06 - Machine Learning Operations (MLOps):
    Concepts and practices for deploying, monitoring, versioning, and maintaining machine learning systems in production environments

How to Use

The repository is organised into chapters, each focusing on a different area of data science and machine learning. Within each chapter, you will find multiple markdown files that group related topics into a single document. Each document contains a combination of explanations, code examples, and diagrams to support understanding.

There is no strict order in which you must work through the repository. However, many of the later chapters build upon concepts introduced earlier. For that reason, you will likely benefit from progressing through the chapters sequentially.

AI has changed how we access knowledge, but it has not changed the importance of learning. It is now possible to simulate competency without developing real understanding — but real competence still matters. The ability to connect ideas, reason independently, and apply concepts beyond the obvious will distinguish those who deliver meaningful value from those who simply reproduce information.

Data science and machine learning span mathematics, statistics, programming, software engineering, and more. You will not master all of these quickly — and you should not try to. The goal is to develop deep mastery of individual concepts rather than surface-level familiarity. To do so, you should narrow your focus, choosing a single specific concept to work on at a time, ensuring you properly understand it before moving on. That way you will be differentiated as someone who can delvier real value rather than just someone who can regurgitate an LLM output.

Before moving on, ask yourself:

  • Can I explain this topic to both a non-technical and a technical audience?
  • Can I give concrete examples of the concept?
  • Do I understand its limitations and assumptions?
  • Can I apply it in code without relying on external assistance? (Note external assistance is fine for syntax/function names but not "How do I apply this concept?" style help)

About

A repository designed for learning/revising a wide range of data science topics ranging from simple to more advanced areas

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors