Skip to content

mahdimhz/ecg-heartbeat-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ECG Heartbeat Classification with SciPy and Machine Learning

This project classifies segmented ECG heartbeat signals from the Kaggle ECG Heartbeat Categorization Dataset. The main experiment compares classical machine learning on SciPy-derived signal features against a compact 1-D CNN trained directly on raw heartbeat segments.

The best model in this run is a random forest on handcrafted signal features:

Model Test accuracy Test macro F1 Test weighted F1
Random Forest 0.9710 0.8586 0.9690
1-D CNN 0.9528 0.8205 0.9537
XGBoost 0.9387 0.7933 0.9442
Linear SVM 0.9036 0.6400 0.9087
Logistic Regression 0.7131 0.5262 0.7797

The high accuracy is not enough by itself. Class 0 dominates the dataset, so macro F1 and per-class recall are the main metrics.

Dataset

Dataset: ECG Heartbeat Categorization Dataset

The Kaggle download contains both MIT-BIH-derived and PTBDB-derived CSV files. This project uses only:

  • data/raw/mitbih_train.csv
  • data/raw/mitbih_test.csv

The PTBDB files are left out because they are a separate binary normal/abnormal task and should not be mixed into the MIT-BIH 5-class classifier.

The MIT-BIH files contain 188 columns per row: 187 heartbeat samples and one label column.

Class Kaggle code Train Validation Test
0 N 57,977 14,494 18,118
1 S 1,778 445 556
2 V 4,630 1,158 1,448
3 F 513 128 162
4 Q 5,145 1,286 1,608

Class 3 is less than 1% of the data, so it is the hardest class to evaluate reliably.

EDA

MIT-BIH class counts

MIT-BIH mean waveforms

The waveform values are already scaled to [0, 1]. Around 41% of waveform entries are zero, which is consistent with fixed-length segmented beats containing padded or low-amplitude regions.

Method

Splitting

The Kaggle test file is kept untouched as the final hold-out set. The Kaggle training file is split into train and validation using stratified sampling:

  • train: 70,043 rows
  • validation: 17,511 rows
  • test: 21,892 rows

The train/validation split indices are saved to data/processed/ for leakage checks.

SciPy Feature Pipeline

The classical models use 29 features extracted after a light Butterworth band-pass filter:

  • time-domain: mean, standard deviation, extrema, quantiles, skew, kurtosis, energy, zero-crossing rate
  • frequency-domain: FFT band-power ratios, spectral centroid, rolloff, dominant frequency, spectral entropy
  • shape features: main peak position, peak height, prominence, width, and short-lag autocorrelation

Feature arrays are saved as:

  • data/processed/features_train.npy
  • data/processed/features_val.npy
  • data/processed/features_test.npy

Models

Classical models:

  • Logistic Regression
  • Linear SVM
  • Random Forest
  • XGBoost

Deep learning baseline:

  • compact 1-D CNN with class-weighted cross entropy
  • best checkpoint selected by validation macro F1

The CNN originally over-weighted the rarest class with full inverse-frequency weights. That produced high class 3 recall but too many false positives. The final CNN uses square-root class weights, which was more balanced.

Results

Classical model metric comparison

Best Classical Model: Random Forest

Class Precision Recall F1 Support
0 0.9705 0.9982 0.9842 18,118
1 0.9688 0.5594 0.7092 556
2 0.9591 0.8736 0.9143 1,448
3 0.8632 0.6235 0.7240 162
4 0.9960 0.9291 0.9614 1,608

Random forest confusion matrix

1-D CNN

Class Precision Recall F1 Support
0 0.9833 0.9656 0.9744 18,118
1 0.7402 0.5791 0.6498 556
2 0.7173 0.9620 0.8218 1,448
3 0.6609 0.7099 0.6845 162
4 0.9903 0.9540 0.9718 1,608

1-D CNN confusion matrix

Main Finding

The random forest on SciPy features is the strongest model overall in this run. The CNN improves recall on class 3, but it loses enough precision and class 1 performance that its macro F1 is lower.

This is a useful result: the classical signal-feature baseline is not just a placeholder. On this segmented benchmark, a well-tuned feature pipeline plus random forest is competitive with a simple neural baseline.

Sanity Checks

Row-level split checks:

Check Value
Train/validation index overlap 0
Train/validation duplicate waveforms 0
Train/test duplicate waveforms 0
Validation/test duplicate waveforms 0

Shuffled-label validation checks:

Model Accuracy Macro F1 Weighted F1
Shuffled Logistic Regression 0.8277 0.1811 0.7497
Shuffled Random Forest 0.8275 0.1815 0.7498

The shuffled-label accuracy is still high because class 0 is dominant. Macro F1 collapses to about 0.18, which is the relevant sign that the real models are learning label-related structure rather than only exploiting class imbalance.

Reproducing the Pipeline

Place the Kaggle CSV files in data/raw/:

data/raw/mitbih_train.csv
data/raw/mitbih_test.csv

Then run the project stages from the repository root:

python -m src.data.validate_data
python -m src.data.split_data
python -m src.features.extract_features
python -m src.models.classical
python -m src.models.train_cnn
python -m src.models.sanity_checks
python -m src.models.final_report

The notebooks in notebooks/ are executed analysis artifacts:

  • 01_eda.ipynb
  • 02_preprocessing_and_features.ipynb
  • 03_classical_ml_models.ipynb
  • 04_1d_cnn.ipynb
  • 05_final_evaluation.ipynb

Project Structure

ecg-heartbeat-classification/
├── data/
│   ├── raw/
│   ├── processed/
│   └── external/
├── models/
│   ├── checkpoints/
│   └── metrics/
├── notebooks/
├── reports/
│   ├── figures/
│   └── tables/
└── src/
    ├── data/
    ├── features/
    ├── models/
    └── visualization/

Limitations

  • The dataset is already segmented into individual beats, so this project does not solve beat detection from continuous ECG.
  • The split is not patient-level. Strong beat-level results can overstate generalization if similar patients or recording conditions appear across splits.
  • The Kaggle benchmark is derived from MIT-BIH and PTBDB; this project evaluates only the MIT-BIH-derived multiclass files.
  • No external cohort was used.
  • The class labels are grouped categories, not fine-grained arrhythmia annotations.
  • These results are not clinical validation.

Bottom Line

The strongest result is random forest on SciPy-derived signal features with test macro F1 of 0.8586. The result passes basic row-level leakage and shuffled-label checks, but minority-class recall remains the main weakness, especially for classes 1 and 3.

About

ECG heartbeat classification on the MIT‑BIH Kaggle dataset using SciPy signal features vs a 1‑D CNN, with careful imbalance handling and leakage checks. kaggle +1

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors