Submitted by: Tirtha Dutta
Model Type: Binary Classification | Supervised Learning
Use Support Vector Machines (SVM) to classify breast cancer tumors as Malignant (M) or Benign (B) using the Breast Cancer Wisconsin Dataset.
- Source: Kaggle – Breast Cancer Dataset
- Total Records: 569 rows × 32 columns
- Target Variable:
diagnosis(M = Malignant, B = Benign)
- Kernel-based classification (Linear & RBF SVM)
- Feature encoding and standardization
- Hyperparameter tuning using GridSearchCV
- Visualizing decision boundaries
- End-to-end ML workflow: from raw data to results
- Dropped duplicate rows and the
idcolumn - Verified missing values = 0
- Cleaned CSV saved as:
data/breast_cancer_cleaned.csv
- Boxplot of features:
images/eda_boxplot.png - Heatmap of correlations:
images/eda_heatmap.png
- Label encoded
diagnosis:
B → 0 (Benign)|M → 1 (Malignant) - Scaled features using
StandardScaler
- Trained Linear SVM and RBF SVM
- Used GridSearchCV for tuning
Candgamma - Cross-validation (
cv=5) accuracy: 97.89% - Best parameters:
C=10,gamma=0.01(RBF Kernel)
| Proof Type | Output |
|---|---|
| Decision Boundary Plot | images/decision_boundary.png |
| Hyperparameter Tuning Heatmap | images/svm_rbf_tuning_heatmap.png |
| Classification Report | Printed in Notebook |
| Accuracy & CV Score | Printed via GridSearchCV |
| Metric | Value |
|---|---|
| Accuracy | 97.89% |
| Kernel | RBF |
Best C |
10 |
Best gamma |
0.01 |
- Python 3.10+
- Pandas, NumPy
- Matplotlib, Seaborn
- scikit-learn
- Jupyter Notebook
This project is licensed under the MIT License.