GitHub - khushi4124/Breast-Cancer-Detection: Built a high-precision Breast Cancer prediction system using a Linear Support Vector Machine (SVM) on the Wisconsin Diagnostic dataset. I implemented a professional Scikit-Learn Pipeline. The model achieved a 99% accuracy rate.

Problem Statement

In clinical diagnostics, the manual classification of tumors as Malignant (cancerous) or Benign (non-cancerous) is a time-consuming process prone to human error. The goal of this project is to develop a robust machine learning pipeline that can automate this classification using the Breast Cancer Wisconsin (Diagnostic) Dataset.

Dataset Used: https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data

Solution

Technical Approach Used: Linear Support Vector Machine (SVM) with automated preprocessing.

To solve the classification problem, I implemented a Support Vector Machine (SVM) using a Linear Kernel.

SVM was chosen because of its effectiveness in finding the "Optimal Hyperplane" that maximizes the margin (the safety buffer) between classes.

Key Implementation Details:

Pipeline Architecture: Utilized a Scikit-Learn Pipeline to bundle StandardScaler and the SVC model. This ensures that the data scaling is performed correctly within each fold of cross-validation, preventing data leakage.

Hyperparameter Tuning: Optimized the C hyperparameter 1. This "Soft Margin" approach improved the model's ability to generalize to new, unseen patient data by prioritizing a wider decision boundary over perfect memorization of training noise.

Evaluation Strategy: Beyond simple accuracy, the model was evaluated using:

5-Fold Cross-Validation: To ensure stability across different data distributions.

Confusion Matrix Analysis: Specifically monitoring False Negatives to evaluate clinical safety.

Results and Impact:

The optimized SVM model achieved a Mean Cross-Validation Accuracy of 97.19%, with a specific test set performance of 99.12%.

Final Performance Metrics:

True Negatives: 71 (Benign correctly identified)

True Positives: 42 (Malignant correctly identified)

False Positives: 0 (Zero healthy patients were misdiagnosed)

False Negatives: 1 (Only one case of cancer was missed)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
breast-cancer-data.csv		breast-cancer-data.csv
breast_cancer_detection_svm_model.ipynb		breast_cancer_detection_svm_model.ipynb
breast_cancer_detection_svm_model.py		breast_cancer_detection_svm_model.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages