The Task: Hello,
As CTO and head of Blackwell's eCommerce Team, I'd like to welcome you aboard. I'm excited to get started on this project, but I'd first like to give you a bit of background to get you up to speed. Blackwell has been a successful electronics retailer for over 40 years, with over 30 stores in the Southeast. A little over a year ago we launched our eCommerce website. We are starting to build up customer transaction data from the site and we want to leverage this data to inform our decisions about site-related activities, like online marketing, enhancements to the site and so on, in order to continue to maximize the amount of revenue we generate from eCommerce sales.
To that end, I would like you to explore the customer transaction data we have collected from recent online and in-store sales and see if you can infer any insights about customer purchasing behavior. Specifically, I am interested in the following:
Do customers in different regions spend more per transaction? Which regions spend the most/least? Is there a relationship between number of items purchased and amount spent? To investigate this, I’d like you to use data mining methods to explore the data, look for patterns in the data and draw conclusions. I have attached a data file of customer transactions; it includes some information about the customer who made the transaction, as well as the amount of the transaction, and how many items were purchased. Once you have completed your analysis, please create a brief report of your findings and conclusions and an explanation of how you arrived at those conclusions so I can discuss them with Martin.
Thanks, Danielle
Hello,
Thanks for the great report on our customer's transactions; it will help us better understand what they bought and where they bought it. We can use this information to help our optimize online marketing efforts.
Now that you have investigated the different aspects of customer purchases, I need you to dive deeper in to specific customer demographics so we can better understand to whom to market and why. Our VP of Sales, Martin Goodrich, thinks that customers who shop in the store are older than customers who shop online and that older people spend more money on electronics than younger people. He is considering some marketing activities and potentially some design changes to the website to attract older buyers. Before we even consider any additional activities related to the website, we want to gain insight into any factors that can better understand the age of our customers and if it correlates with how much they spend.
To that end, I would like you to explore the customer transaction data we have collected from recent online and in-store sales and see if you can infer any insights about customer purchasing behavior. Specifically, I am interested in the following:
Are there differences in the age of customers between regions? If so, can we predict the age of a customer in a region based on other demographic data? We need to investigate Martin’s hypothesis: Is there any correlation between age of a customer and if the transaction was made online or in the store? Do any other factors predict if a customer will buy online or in our stores? To investigate this, I’d like you to use machine learning to build a predictive model that can help us in our search. I have attached the same data file of customer transactions. As you know, it includes some information about the customer who made the transaction, as well as the amount of the transaction, and how many items were purchased. As usual, once you have completed your analysis, please create a brief report of your findings and conclusions and an explanation of how you arrived at those conclusions so I can discuss them with Martin.
Thanks, Danielle
Using data mining tools to investigate patterns in complex data sets. Preprocessing data for data mining (e.g., transforming numeric values to nominal values, discretizing data). Using decision tree classifiers to investigate classification problems. Applying cross-validation methods. Interpreting and drawing inferences from the results of data mining. Assessing the predictive performance of classifiers by examining key error metrics. Identifying where learning methods fail and gain insight into why with error analysis. Drawing relationships between learner performance and measured features to help understand model performance. Conducting feature selection to investigate the correlation between different features in a dataset. Presenting data mining results to management.