Skip to content

GitMarcode/M1_ML_Project_Example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

M1 Machine Learning Project (Graded)

This repository contains clean, publishable material for multiple ML notebooks with their required datasets.

Project Objectives

  1. Graded project notebook: predict genre from movie/series metadata and text features.
  2. TD Project 01 notebook: introductory data manipulation and analysis tasks.
  3. TD Project 02 notebook: classification workflow on breast cancer data.
  4. TD Project 03 notebook: predictive maintenance classification workflow.
  5. TD Project 04 notebook: run classification experiments on heart disease, stars, and glass datasets.

Repository Structure

  • TD_2026_S2_M1_ML_Project_Graded.ipynb — complete notebook (problem framing, preprocessing, modeling, evaluation)
  • imdb_descr_titles.csv — input dataset
  • TD_2026_S2_M1_ML_Project_04.ipynb — TD notebook
  • heart_disease_classification.csv — dataset used in TD Project 04
  • stars_nasa_classification.csv — dataset used in TD Project 04
  • glass_classification.csv — dataset used in TD Project 04
  • TD_2026_S2_M1_ML_Project_01.ipynb — TD notebook
  • EU_countries.csv — dataset used in TD Project 01
  • TD_2026_S2_M1_ML_Project_02.ipynb — TD notebook
  • breast_cancer.csv — dataset used in TD Project 02
  • TD_2026_S2_M1_ML_Project_03.ipynb — TD notebook
  • predictive_maintenance.csv — dataset used in TD Project 03

Methodology (Notebook Summary)

The notebook follows a full ML workflow:

  1. Problem framing (multiclass supervised classification)
  2. Data preprocessing
    • target construction from genres
    • text feature engineering (title + description)
    • categorical preprocessing (One-Hot Encoding)
    • numeric imputation
  3. Modeling
    • DummyClassifier (baseline)
    • LogisticRegression
    • LinearSVC
    • MultinomialNB (text baseline)
  4. Evaluation
    • cross-validation metrics
    • test metrics
    • comparison table and short commentary

Recommended Environment

  • Python 3.10+
  • numpy
  • pandas
  • scikit-learn
  • jupyter

Install dependencies (if needed):

pip install numpy pandas scikit-learn jupyter

How to Run

A) Graded notebook

  1. Keep imdb_descr_titles.csv in the same directory as the notebook.
  2. Open TD_2026_S2_M1_ML_Project_Graded.ipynb.
  3. Run all cells sequentially from top to bottom.

B) TD notebook (Project 04)

  1. Keep these files in the same directory:
    • heart_disease_classification.csv
    • stars_nasa_classification.csv
    • glass_classification.csv
  2. Open TD_2026_S2_M1_ML_Project_04.ipynb.
  3. Run all cells sequentially from top to bottom.

C) TD notebook (Project 01)

  1. Keep EU_countries.csv in the same directory as the notebook.
  2. Open TD_2026_S2_M1_ML_Project_01.ipynb.
  3. Run all cells sequentially from top to bottom.

D) TD notebook (Project 02)

  1. Keep breast_cancer.csv in the same directory as the notebook.
  2. Open TD_2026_S2_M1_ML_Project_02.ipynb.
  3. Run all cells sequentially from top to bottom.

E) TD notebook (Project 03)

  1. Keep predictive_maintenance.csv in the same directory as the notebook.
  2. Open TD_2026_S2_M1_ML_Project_03.ipynb.
  3. Run all cells sequentially from top to bottom.

Reproducibility

The notebook uses fixed random seeds in key steps (e.g., stratified split / CV settings where applicable) to keep results stable across runs.

Notes

  • The dataset size is compatible with standard GitHub limits (well below 100 MB).
  • The repository is intentionally minimal and focused on the published notebooks + their datasets.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors