M1 Machine Learning Project (Graded)

This repository contains clean, publishable material for multiple ML notebooks with their required datasets.

Project Objectives

Graded project notebook: predict genre from movie/series metadata and text features.
TD Project 01 notebook: introductory data manipulation and analysis tasks.
TD Project 02 notebook: classification workflow on breast cancer data.
TD Project 03 notebook: predictive maintenance classification workflow.
TD Project 04 notebook: run classification experiments on heart disease, stars, and glass datasets.

Repository Structure

TD_2026_S2_M1_ML_Project_Graded.ipynb — complete notebook (problem framing, preprocessing, modeling, evaluation)
imdb_descr_titles.csv — input dataset
TD_2026_S2_M1_ML_Project_04.ipynb — TD notebook
heart_disease_classification.csv — dataset used in TD Project 04
stars_nasa_classification.csv — dataset used in TD Project 04
glass_classification.csv — dataset used in TD Project 04
TD_2026_S2_M1_ML_Project_01.ipynb — TD notebook
EU_countries.csv — dataset used in TD Project 01
TD_2026_S2_M1_ML_Project_02.ipynb — TD notebook
breast_cancer.csv — dataset used in TD Project 02
TD_2026_S2_M1_ML_Project_03.ipynb — TD notebook
predictive_maintenance.csv — dataset used in TD Project 03

Methodology (Notebook Summary)

The notebook follows a full ML workflow:

Problem framing (multiclass supervised classification)
Data preprocessing
- target construction from genres
- text feature engineering (title + description)
- categorical preprocessing (One-Hot Encoding)
- numeric imputation
Modeling
- DummyClassifier (baseline)
- LogisticRegression
- LinearSVC
- MultinomialNB (text baseline)
Evaluation
- cross-validation metrics
- test metrics
- comparison table and short commentary

Recommended Environment

Python 3.10+
numpy
pandas
scikit-learn
jupyter

Install dependencies (if needed):

pip install numpy pandas scikit-learn jupyter

How to Run

A) Graded notebook

Keep imdb_descr_titles.csv in the same directory as the notebook.
Open TD_2026_S2_M1_ML_Project_Graded.ipynb.
Run all cells sequentially from top to bottom.

B) TD notebook (Project 04)

Keep these files in the same directory:
- heart_disease_classification.csv
- stars_nasa_classification.csv
- glass_classification.csv
Open TD_2026_S2_M1_ML_Project_04.ipynb.
Run all cells sequentially from top to bottom.

C) TD notebook (Project 01)

Keep EU_countries.csv in the same directory as the notebook.
Open TD_2026_S2_M1_ML_Project_01.ipynb.
Run all cells sequentially from top to bottom.

D) TD notebook (Project 02)

Keep breast_cancer.csv in the same directory as the notebook.
Open TD_2026_S2_M1_ML_Project_02.ipynb.
Run all cells sequentially from top to bottom.

E) TD notebook (Project 03)

Keep predictive_maintenance.csv in the same directory as the notebook.
Open TD_2026_S2_M1_ML_Project_03.ipynb.
Run all cells sequentially from top to bottom.

Reproducibility

The notebook uses fixed random seeds in key steps (e.g., stratified split / CV settings where applicable) to keep results stable across runs.

Notes

The dataset size is compatible with standard GitHub limits (well below 100 MB).
The repository is intentionally minimal and focused on the published notebooks + their datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

M1 Machine Learning Project (Graded)

Project Objectives

Repository Structure

Methodology (Notebook Summary)

Recommended Environment

How to Run

A) Graded notebook

B) TD notebook (Project 04)

C) TD notebook (Project 01)

D) TD notebook (Project 02)

E) TD notebook (Project 03)

Reproducibility

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
EU_countries.csv		EU_countries.csv
README.md		README.md
TD_2026_S2_M1_ML_Project_01.ipynb		TD_2026_S2_M1_ML_Project_01.ipynb
TD_2026_S2_M1_ML_Project_02.ipynb		TD_2026_S2_M1_ML_Project_02.ipynb
TD_2026_S2_M1_ML_Project_03.ipynb		TD_2026_S2_M1_ML_Project_03.ipynb
TD_2026_S2_M1_ML_Project_04.ipynb		TD_2026_S2_M1_ML_Project_04.ipynb
TD_2026_S2_M1_ML_Project_Graded.ipynb		TD_2026_S2_M1_ML_Project_Graded.ipynb
breast_cancer.csv		breast_cancer.csv
glass_classification.csv		glass_classification.csv
heart_disease_classification.csv		heart_disease_classification.csv
imdb_descr_titles.csv		imdb_descr_titles.csv
predictive_maintenance.csv		predictive_maintenance.csv
stars_nasa_classification.csv		stars_nasa_classification.csv

GitMarcode/M1_ML_Project_Example

Folders and files

Latest commit

History

Repository files navigation

M1 Machine Learning Project (Graded)

Project Objectives

Repository Structure

Methodology (Notebook Summary)

Recommended Environment

How to Run

A) Graded notebook

B) TD notebook (Project 04)

C) TD notebook (Project 01)

D) TD notebook (Project 02)

E) TD notebook (Project 03)

Reproducibility

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages