This was my final project for Big Data Analytics (CIS 5450) at UPenn. My partners and I sought to predict the selling prices of used cars based on various features. After performing EDA, we experimented with several encoding methods. We also processed the data with PCA. We then built linear regression, random forest, and gradient boosting regression models. In the notebook, we provide a thorough explanation of the motivation behind certain manipulations of the data, an overview of our results, and discussion about the various methods and models employed.
We analyzed the following dataset from Kaggle. https://www.kaggle.com/datasets/syedanwarafridi/vehicle-sales-data