Project_2

Extract-Transform-Load Homework Avocados around the US!

Proposal

As new employees for the Hass Avocado company, we have been tasked with creating a database to help the advertising and sales team find out who (besides Millenials) are buying their avocados and where. The avocado toast trend is stronger than ever and it's up to us to help detangle all of this data. Our final database is meant to be used to help track avocado sales weekly, by region, city, state, or zip code.

Roles

Miguel: Loading Johnathan: Transform Marquetta: ERD and ReadMe Manny: Transform

Sources

Weekly Avocado Sales from 2015-2018: https://www.kaggle.com/datasets/neuromusic/avocado-prices
Census info: https://www.census.gov/data/tables/time-series/demo/popest/2010s-total-cities-and-towns.html

The Process

Extract: your original data sources and how the data was formatted (CSV, JSON, pgAdmin 4, etc). Transform: what data cleaning or transformation was required. Load: the final database, tables/collections, and why this was chosen.

Step 1 Extract

Downloaded avocado sales CSV file from Kaggle and converted it into a data frame. Fixed dates and removed zeroes. Formatted text and numbers.
Downloaded SUB-IP/population CSV from census.gov website.

Step 2 Transform

Avocado sales CSV:
- Imported into Jupyter Notebook.
- Formatted dates from YYYY-MM-DD to MM/DD/YYYY.
- Removed ###### values from date column.
- Formatted text and numbers.
- Removed bag (Total Bags/Small Bags/Large Bags/XLarge Bags) columns.
- Removed commas from region column.
- Removed non-city data from region.
- Formatted columns names to be lower case.
SUB-IP-EST2019-ANNRES:
- Renamed to "population" and imported into Jupyter Notebook.
- Removed "Unnamed: 0" column.
- Split city_state column into separate city and state columns.
- Added city ID.
Junction Table
- Created new "cities" table using the 'city' and 'state' columns from the population table to use as the IDs. Matched cities from the "cities" table to the "avocado" table to make queries easier.

Step 3 Load

After extracting and transforming the data, we were left with two data frames, the population with city populations, and the avocado table with weekly sales, date, average price, total volume and year. These databases were loaded into SQL primarily because of the structure of the data and how easily queries could be run. Joins could be made between any of the tables to show information such as how many sales were made in a certain city in a certain year or seeing the population compared to total volume to help with advertising and research.

Schema

We created our Entity Relationship Database using Quick DBD

Sample Queries

To show the user that you can quickly search through the database for avocado info, by showing a couple queries to show our relational database.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Images		Images
OutputData		OutputData
Resources		Resources
avocado-app		avocado-app
QuickDBD-Project 2.png		QuickDBD-Project 2.png
README.md		README.md
avocado (version 1).csv		avocado (version 1).csv
avocado_cleanup.ipynb		avocado_cleanup.ipynb
clean_up_population.ipynb		clean_up_population.ipynb
config.py		config.py
population.sql		population.sql
population_clean_up.ipynb		population_clean_up.ipynb
schema.sql		schema.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project_2

Proposal

Roles

Sources

The Process

Step 1 Extract

Step 2 Transform

Step 3 Load

Schema

Sample Queries

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project_2

Proposal

Roles

Sources

The Process

Step 1 Extract

Step 2 Transform

Step 3 Load

Schema

Sample Queries

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages