Tool : Jupyter Notebook
Programming Language : Python 3
Visualization : Matplotlib, Seaborn
Dataset : Click Ads Data
A company wants to know the effectiveness of an advertisement that they display, this is important for the company to be able to find out how much the advertising has been marketed so that it can attract customers to see the advertisement.
This project aims to reduce the cost of advertisement targeting by 50%, and increase gross profit by 5%. By processing historical advertisement data and finding insights and patterns that occur, it can help companies determine marketing targets. The focus of this project is to create a machine learning classification model that functions to determine the right target customers.
The result of this project is that we managed to decrease advertisement cost by 51%, and increase gross profit by 9%. We also provide actionable recommendations that can be applied by the marketing team can prioritize the specific customer segment so that the gross profit of the company can increase.
The dataset contains 1,000 unique observations with 10 features in various data types. Each row represent customers' demographics, and behaiour records on website and ads in 7 months period. There are missing values in 4 columns (Daily Time Spent on Site, Area Income, Daily Internet Usage, Male), and 1 misdefined data type (Timestamp).
- Handling Missing Values
- Feature Engineering
- Drop Unnecessary Features
- Feature Encoding
- Feature Scaling
We leveraged logistic regression, decision tree, random forest, and KNN algorithms. Among the 4 models, the logistic regression model results the highest AUC, meaning that the model is the best able to distinguish between classes.
With classification threshold 0.5, the model is able to predict up to 97% of the existing potential interested users in the ads, with the risk of targeting disinterested users due to predicting errors of 1%. It means for every 100 users who are predicted to be interested, there is 1 who is actually not interested in the ads. The most important features in prediction are daily time spent on site, daily internet usage, area income, and age.
We should targeting ads to the mature users (age between 36 - 61) who belong to a low social class (earning around 300M/year) with daily time spent on site between 32-68 mins. The ads must advertise products or services whose value isn’t higher than their buying power. There is no particular ad category that they are significantly interested to.



