This project aims to build machine learning models for predicting which political party a voter is likely to vote for, based on survey data. The dataset consists of 1,525 voters and 9 variables, collected as part of an exit poll by a leading news channel, CNBE. The predictions will help in estimating the overall election outcome in the surveyed regions.
You have been hired by CNBE to analyze voter behavior using a dataset with 9 features. The goal is to develop machine learning models that predict the party a voter will choose, helping CNBE project election results more accurately.
- Build effective classification machine learning models.
- Conduct thorough exploratory data analysis (EDA).
- Evaluate models using appropriate performance metrics.
- Select the best-performing model based on evaluation criteria.
- Total Records: 1,525
- Variables:
vote– Political party/candidate the voter chose.age– Respondent's age.economic.cond.national– Perceived national economic condition.economic.cond.household– Household economic condition perception.Blair– Opinion rating of Tony Blair.Hague– Opinion rating of William Hague.Europe– Stance on European Union issues.political.knowledge– Level of political knowledge.
- Data Source: The dataset consists of two tabs: "Data" and "Data Dictionary." Only the "Data" tab is used for analysis.
- Data Cleaning: Checked for missing values and duplicate records.
- Univariate & Bivariate Analysis: Studied the distribution and relationships between variables.
- Outlier Treatment: Applied the Interquartile Range (IQR) method to handle outliers in economic variables.
- Feature Engineering: Removed redundant features and optimized the dataset.
- Train-Test Split: Split data into training and testing sets.
- Applied ML Models:
- Logistic Regression
- Decision Tree (with pruning)
- Naïve Bayes (optimized using Grid Search)
- K-Nearest Neighbors (KNN with K=3, 5, 7)
- Bagging & Boosting (GB Boost, XGBoost)
- Model Evaluation:
- Accuracy
- Precision, Recall, F1-score
- Confusion Matrix
- Languages: Python
- Tools: VS Code / Google Colab / Jupyter Notebook
- Libraries: NumPy, Pandas, Matplotlib, Seaborn, Scikit-learn, XGBoost
- The dataset had a balanced gender representation (~53% female, ~47% male).
- Labour had a higher voter count (69.7%) compared to Conservative (30.3%).
- Logistic Regression and XGBoost performed well, with accuracy around 85%.
- Jupyter Notebook with full implementation.
- Business report in PDF format (excluding code).
- Visualizations and insights from model predictions.
- Project report: Report.pdf
- Dataset: ElectionData.xlsx
- Sample Notebook: SOT23B1_Abhyudaya.ipynb