ECG Heartbeat Classification with SciPy and Machine Learning

This project classifies segmented ECG heartbeat signals from the Kaggle ECG Heartbeat Categorization Dataset. The main experiment compares classical machine learning on SciPy-derived signal features against a compact 1-D CNN trained directly on raw heartbeat segments.

The best model in this run is a random forest on handcrafted signal features:

Model	Test accuracy	Test macro F1	Test weighted F1
Random Forest	0.9710	0.8586	0.9690
1-D CNN	0.9528	0.8205	0.9537
XGBoost	0.9387	0.7933	0.9442
Linear SVM	0.9036	0.6400	0.9087
Logistic Regression	0.7131	0.5262	0.7797

The high accuracy is not enough by itself. Class 0 dominates the dataset, so macro F1 and per-class recall are the main metrics.

Dataset

Dataset: ECG Heartbeat Categorization Dataset

The Kaggle download contains both MIT-BIH-derived and PTBDB-derived CSV files. This project uses only:

data/raw/mitbih_train.csv
data/raw/mitbih_test.csv

The PTBDB files are left out because they are a separate binary normal/abnormal task and should not be mixed into the MIT-BIH 5-class classifier.

The MIT-BIH files contain 188 columns per row: 187 heartbeat samples and one label column.

Class	Kaggle code	Train	Validation	Test
0	N	57,977	14,494	18,118
1	S	1,778	445	556
2	V	4,630	1,158	1,448
3	F	513	128	162
4	Q	5,145	1,286	1,608

Class 3 is less than 1% of the data, so it is the hardest class to evaluate reliably.

EDA

The waveform values are already scaled to [0, 1]. Around 41% of waveform entries are zero, which is consistent with fixed-length segmented beats containing padded or low-amplitude regions.

Method

Splitting

The Kaggle test file is kept untouched as the final hold-out set. The Kaggle training file is split into train and validation using stratified sampling:

train: 70,043 rows
validation: 17,511 rows
test: 21,892 rows

The train/validation split indices are saved to data/processed/ for leakage checks.

SciPy Feature Pipeline

The classical models use 29 features extracted after a light Butterworth band-pass filter:

time-domain: mean, standard deviation, extrema, quantiles, skew, kurtosis, energy, zero-crossing rate
frequency-domain: FFT band-power ratios, spectral centroid, rolloff, dominant frequency, spectral entropy
shape features: main peak position, peak height, prominence, width, and short-lag autocorrelation

Feature arrays are saved as:

data/processed/features_train.npy
data/processed/features_val.npy
data/processed/features_test.npy

Models

Classical models:

Logistic Regression
Linear SVM
Random Forest
XGBoost

Deep learning baseline:

compact 1-D CNN with class-weighted cross entropy
best checkpoint selected by validation macro F1

The CNN originally over-weighted the rarest class with full inverse-frequency weights. That produced high class 3 recall but too many false positives. The final CNN uses square-root class weights, which was more balanced.

Results

Best Classical Model: Random Forest

Class	Precision	Recall	F1	Support
0	0.9705	0.9982	0.9842	18,118
1	0.9688	0.5594	0.7092	556
2	0.9591	0.8736	0.9143	1,448
3	0.8632	0.6235	0.7240	162
4	0.9960	0.9291	0.9614	1,608

1-D CNN

Class	Precision	Recall	F1	Support
0	0.9833	0.9656	0.9744	18,118
1	0.7402	0.5791	0.6498	556
2	0.7173	0.9620	0.8218	1,448
3	0.6609	0.7099	0.6845	162
4	0.9903	0.9540	0.9718	1,608

Main Finding

The random forest on SciPy features is the strongest model overall in this run. The CNN improves recall on class 3, but it loses enough precision and class 1 performance that its macro F1 is lower.

This is a useful result: the classical signal-feature baseline is not just a placeholder. On this segmented benchmark, a well-tuned feature pipeline plus random forest is competitive with a simple neural baseline.

Sanity Checks

Row-level split checks:

Check	Value
Train/validation index overlap	0
Train/validation duplicate waveforms	0
Train/test duplicate waveforms	0
Validation/test duplicate waveforms	0

Shuffled-label validation checks:

Model	Accuracy	Macro F1	Weighted F1
Shuffled Logistic Regression	0.8277	0.1811	0.7497
Shuffled Random Forest	0.8275	0.1815	0.7498

The shuffled-label accuracy is still high because class 0 is dominant. Macro F1 collapses to about 0.18, which is the relevant sign that the real models are learning label-related structure rather than only exploiting class imbalance.

Reproducing the Pipeline

Place the Kaggle CSV files in data/raw/:

data/raw/mitbih_train.csv
data/raw/mitbih_test.csv

Then run the project stages from the repository root:

python -m src.data.validate_data
python -m src.data.split_data
python -m src.features.extract_features
python -m src.models.classical
python -m src.models.train_cnn
python -m src.models.sanity_checks
python -m src.models.final_report

The notebooks in notebooks/ are executed analysis artifacts:

01_eda.ipynb
02_preprocessing_and_features.ipynb
03_classical_ml_models.ipynb
04_1d_cnn.ipynb
05_final_evaluation.ipynb

Project Structure

ecg-heartbeat-classification/
├── data/
│   ├── raw/
│   ├── processed/
│   └── external/
├── models/
│   ├── checkpoints/
│   └── metrics/
├── notebooks/
├── reports/
│   ├── figures/
│   └── tables/
└── src/
    ├── data/
    ├── features/
    ├── models/
    └── visualization/

Limitations

The dataset is already segmented into individual beats, so this project does not solve beat detection from continuous ECG.
The split is not patient-level. Strong beat-level results can overstate generalization if similar patients or recording conditions appear across splits.
The Kaggle benchmark is derived from MIT-BIH and PTBDB; this project evaluates only the MIT-BIH-derived multiclass files.
No external cohort was used.
The class labels are grouped categories, not fine-grained arrhythmia annotations.
These results are not clinical validation.

Bottom Line

The strongest result is random forest on SciPy-derived signal features with test macro F1 of 0.8586. The result passes basic row-level leakage and shuffled-label checks, but minority-class recall remains the main weakness, especially for classes 1 and 3.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
models		models
notebooks		notebooks
reports		reports
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECG Heartbeat Classification with SciPy and Machine Learning

Dataset

EDA

Method

Splitting

SciPy Feature Pipeline

Models

Results

Best Classical Model: Random Forest

1-D CNN

Main Finding

Sanity Checks

Reproducing the Pipeline

Project Structure

Limitations

Bottom Line

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ECG Heartbeat Classification with SciPy and Machine Learning

Dataset

EDA

Method

Splitting

SciPy Feature Pipeline

Models

Results

Best Classical Model: Random Forest

1-D CNN

Main Finding

Sanity Checks

Reproducing the Pipeline

Project Structure

Limitations

Bottom Line

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages