You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The majority of Pymoli players are males between the ages of 15-29. These players account for the majority of purchases by count and revenue.
The average purchase size is just under $3.00. All but two players made 3 or fewer purchases.
Our most popular items are $2.35 or under, which is in line with the average purchase price.
Potential courses of action include slightly increasing items prices, while keeping the prices under $3.00, to earn more money off typical purchasing, or increasing the availability of items below $2.35 to encourage more purchasing.
# ------------------------------------------------------# Step 1: Get the filename, then Read in and clean data.# ------------------------------------------------------# greet the user instructionsprint("Welcome to Pymoli Data Analysis!")
filename='data/purchase_data.json'# Read in the data filepurchases_df=pd.read_json(filename, orient='records')
# make sure everything is the right data type to usepurchases_df['Price']=purchases_df['Price'].replace("%","", regex=True).astype(float)
# check and remove null valuespurchases_df.dropna(how='any')
purchases_df.to_csv("test.csv")
Welcome to Pymoli Data Analysis!
Gender Demographics
# -----------------------------------------------------------------------# Step 2: Analyze the players: Look at # Total number of players & Gender demographics (Double-checked in Excel)# -----------------------------------------------------------------------# Total Number of playersnum_players_total=len(purchases_df.groupby('SN').count())
# remove all dupes to account for players who made multiple purchasespurchases_noDupes_df=purchases_df.set_index('SN')
purchases_noDupes_df=purchases_noDupes_df[~purchases_noDupes_df.index.duplicated(keep='first')]
# count and % of male playersnum_players_male=purchases_noDupes_df[purchases_noDupes_df['Gender']=='Male'].count()['Age']
percent_players_male=num_players_male/num_players_total*100# count and % of female playersnum_players_female=purchases_noDupes_df[purchases_noDupes_df['Gender']=='Female'].count()['Age']
percent_players_female=num_players_female/num_players_total*100# count and % of other/non-disclosed playersnum_players_other=purchases_noDupes_df[purchases_noDupes_df['Gender']=='Other / Non-Disclosed'].count()['Age']
percent_players_other=num_players_other/num_players_total*100# create dataframes to hold these resultgender_demographics_num_df=pd.DataFrame.from_dict({"Male":num_players_male, "Female":num_players_female,\
"Other / Non-Disclosed":num_players_other}, orient='index')
gender_demographics_percent_df=pd.DataFrame.from_dict({"Male":percent_players_male, "Female":percent_players_female,\
"Other / Non-Disclosed":percent_players_other}, orient='index')
# merge dataframesgender_demographics_df=gender_demographics_num_df.merge(gender_demographics_percent_df, how='outer',\
left_index=True, right_index=True)
# rename columnsgender_demographics_df.rename(columns={'0_x': 'Number of Players', '0_y': 'Percent of Players'}, inplace=True)
# format resultsgender_demographics_df['Percent of Players'] =gender_demographics_df['Percent of Players'].map('{0:.2f}%'.format)
# print out resultsdisplay(gender_demographics_df)
# create labels and wedge sizeslabels= ['Male', 'Female', 'Other / Non-Disclosed']
wedge_sizes= [gender_demographics_num_df[0]['Male'],\
gender_demographics_num_df[0]['Female'],\
gender_demographics_num_df[0]['Other / Non-Disclosed']]
# create pie chartfig1, ax1=plt.subplots()
ax1.pie(wedge_sizes, labels=labels, autopct='%1.1f%%', shadow=True, startangle=90)
ax1.axis('equal')
plt.legend(loc='lower right')
plt.show()
# ----------------------------------# End Step 2# ----------------------------------
Even the top spenders don't make an extraordinary number of purchases. The top 3 spenders made 4-5 purchases, and everyone else made 3 or less.
Overall Purchasing Analysis
# --------------------------------------------------------------------# Question 2 - Purchasing Analysis (Total) (Double-checked in Excel)# --------------------------------------------------------------------# number of unique items purchasednum_items_unique=len(purchases_df['Item ID'].value_counts())
# average purchase priceoverall_purchase_avg='${:,.2f}'.format(purchases_df.mean()['Price'])
# total number of purchasesoverall_purchase_count=purchases_df.count()['Item ID']
# total revenueoverall_purchase_revenue='${:,.2f}'.format(purchases_df['Price'].sum())
# create dataframe to display resultspurchasing_total_analysis_df=pd.DataFrame({'Number of Unique Items Purchased':[num_items_unique],\
'Average Purchase Total':[overall_purchase_avg],\
'Total Number of Purchases':[overall_purchase_count],\
'Total Revenue':[overall_purchase_revenue]})
# rearrange columns in a more sensible waypurchasing_total_analysis_df=purchasing_total_analysis_df[['Total Revenue', 'Total Number of Purchases',\
'Average Purchase Total', 'Number of Unique Items Purchased']]
display(purchasing_total_analysis_df)
Just as the majority of users are male, the majority of purchases are made by male users. They accounted for 81.15% of purchases by count and 81.69% by purchase amount.
# ----------------------------------------------------------------------------# Question 4 - Purchasing Analysis (Age) (partially double-checked with Excel)# ----------------------------------------------------------------------------ages_df=purchases_df.sort_values('Age')
# create bins: <10, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-45, 45+bins= [0, 10, 15, 20, 25, 30, 35, 40, 45, 100]
# name the binsbin_names= ['<10', '10-14', '15-19', '20-24', '25-29', '30-34', '35-39', '40-44', '45+']
# create a usable version of the dataframe for analysispurchases_df['Age Range'] =pd.cut(purchases_df['Age'], bins, labels=bin_names, right=False)
# generate dataframes with the data we need# purchase countage_purchase_count_df=pd.DataFrame.from_dict(dict(purchases_df.groupby('Age Range').count()))
age_purchase_count_df=age_purchase_count_df.drop(['Age', 'Gender', 'Item ID', 'Item Name', 'Price'], 1)
age_purchase_count_df.rename(columns={'SN' : 'Purchase Count'}, inplace=True)
# purchase totalage_purchase_total_df=pd.DataFrame.from_dict(dict(purchases_df.groupby('Age Range').sum()))
age_purchase_total_df=age_purchase_total_df.drop(['Age', 'Item ID'], 1)
age_purchase_total_df.rename(columns={'Price' : 'Total Purchases'}, inplace=True)
# average purchase priceage_purchase_avg_df=pd.DataFrame.from_dict(dict(purchases_df.groupby('Age Range').mean()))
age_purchase_avg_df=age_purchase_avg_df.drop(['Age', 'Item ID'], 1)
age_purchase_avg_df.rename(columns={'Price' : 'Average Purchase Price'}, inplace=True)
# create age demographics for funzies and for later usepurchases_noDupes_round2_df=purchases_df.set_index('SN')
purchases_noDupes_round2_df=purchases_noDupes_round2_df[~purchases_noDupes_round2_df.index.duplicated(keep='first')]
age_demographics_df=purchases_noDupes_round2_df.groupby('Age Range').count()
age_demographics_df=age_demographics_df.drop(['Age', 'Gender', 'Item Name', 'Price'], 1)
age_demographics_df.rename(columns={'Item ID' : 'Number of Users'}, inplace=True)
# make a dataframe to store the resultspurchasing_age_analysis_df=age_demographics_df.merge(age_purchase_count_df, how='outer',\
left_index=True, right_index=True)
purchasing_age_analysis_df=purchasing_age_analysis_df.merge(age_purchase_total_df, how='outer',\
left_index=True, right_index=True)
purchasing_age_analysis_df=purchasing_age_analysis_df.merge(age_purchase_avg_df, how='outer',\
left_index=True, right_index=True)
# normalized totalspurchasing_age_analysis_df["Normalized Total"] =purchasing_age_analysis_df['Total Purchases']/age_demographics_df['Number of Users']
# format purchasing_age_analysis_df['Total Purchases'] =purchasing_age_analysis_df['Total Purchases'].map('${:,.2f}'.format)
purchasing_age_analysis_df['Average Purchase Price'] =purchasing_age_analysis_df['Average Purchase Price'].map('${:,.2f}'.format)
purchasing_age_analysis_df['Normalized Total'] =purchasing_age_analysis_df['Normalized Total'].map('${:,.2f}'.format)
# print resultsdisplay(purchasing_age_analysis_df)
# --------------------------------------# End Step 3# --------------------------------------