Independent study — Google Cloud SE pilot for a fictional Wayfair-class retailer

A Google Cloud Solutions Engineer independent study: take a retail star-schema dataset and drive it end-to-end through pre-sales discovery, a two-track workshop sequence (Business Analytics + Data Science), a signed-off KPI contract, a ranked ML experiment backlog, a GCP reference architecture, and a pitch deck.

Read first: ROLE_AND_SCOPE.md — external SE, fictional retailer as customer, GCP only.

The story arc (in the order it was built)

Role & data — ROLE_AND_SCOPE.md, EXPLORATION.md.
Local lab — STEP_BY_STEP.md: Docker MySQL → load star schema → run SQL.
Week-2 BA workshop — workshops/01_BA_WORKSHOP_BRIEF.md → workshops/02_BA_WORKSHOP_FINDINGS.md → signed-off workshops/03_PROBLEM_AND_KPIS.md.
Week-2 DS workshop (parallel track) — workshops/04_DS_WORKSHOP_BRIEF.md → workshops/05_DS_WORKSHOP_FINDINGS.md → committed workshops/06_ML_HYPOTHESES_AND_EXPERIMENTS.md.
GCP reference architecture — docs/CLOUD_ARCHITECTURE.md: Cloud SQL pilot + BigQuery/BQML analytics, with every design choice traced back to a workshop finding.
Pitch — docs/PITCH_DECK.md: 12-slide narrative for the Director of Analytics.
First-pitch one-pager — docs/ONE_PAGER.md: CFO-readable, dollar-led summary that travels through the customer's org without you. See docs/assets/ONE_PAGER.png for the visual.

Suggested first pass: ROLE_AND_SCOPE → skim EXPLORATION → workshops/03_PROBLEM_AND_KPIS → workshops/06_ML_HYPOTHESES_AND_EXPERIMENTS → docs/CLOUD_ARCHITECTURE → docs/PITCH_DECK → docs/ONE_PAGER.

Headline findings (so you know what the pitch is actually selling)

Produced from the CSVs in this repo by scripts/profile_dataset.py and scripts/ds_profile.py — re-runnable any time.

West earns 42% of revenue and 56% of profit; Central earns 17% of revenue but only 2.6% of profit.
Discount cliff at 20%: below 20% discount, 0.6% of lines lose money; at ≥30% discount, 96.3% of lines lose money. 70.7% of all negative-profit lines come from the ≥30% band.
Growth is real: +11.5% → +25.6% → +18.4% YoY sales across 2014–2017.
Seasonality is a 6.9× peak/trough ratio (Nov peak, Feb trough) — monthly forecasting is viable.
19% of customers have negative lifetime profit — a concrete, model-able segment.

Documentation map

Document	What it is for
`ROLE_AND_SCOPE.md`	Canonical role & scope — who you are, who the customer is, pre/post-sales, no ambiguity
`EXPLORATION.md`	Discovery notes, business scenarios (migration + engagement), GCP service anchors, data/SQL reference
`STEP_BY_STEP.md`	Lab steps — Docker → clean/load → run SQL
`workshops/`	Week-2 BA + DS workshop sequence (briefs → findings → signed-off artifacts)
`docs/CLOUD_ARCHITECTURE.md`	Full GCP reference architecture with traceability to workshops
`docs/PITCH_DECK.md`	12-slide pre-sales narrative for the Director of Analytics
`docs/ONE_PAGER.md`	First-pitch one-pager — CFO-readable, dollar-led, leave-behind from the intro meeting (rendered: `docs/assets/ONE_PAGER.png`)
`GITHUB_SETUP.md`	Original repo-creation notes
`Questions`	Original brainstorm

Code / SQL map

Profiling (Python, pandas, no other deps)

scripts/profile_dataset.py — BA-flavored profile; writes workshops/profile_output.json.
scripts/ds_profile.py — DS-flavored profile (seasonality, discount cliff, Pareto, RFM, outliers); writes workshops/ds_profile_output.json.
scripts/clean_and_build_fact.py — CSV clean pass → data/clean/.
scripts/load_mysql.py — loads cleaned CSVs into local Docker MySQL.

SQL — pilot (Cloud SQL for MySQL)

sql/01_schema_mysql.sql — star schema DDL.
sql/02_enriched_fact_view.sql — enriched line-level view.
sql/03_aggregate_monthly_region.sql — primary-KPI aggregate.
sql/04_aggregate_monthly_category.sql — powers top-5 concentration KPI + guardrail.
sql/05_aggregate_monthly_customer_segment.sql — segment-level view.
sql/06_guardrail_neg_profit_lines.sql — weekly guardrail views (total, by sub-category, by region).
sql/07_discount_cliff_rule.sql — E2 single-rule baseline (what the ML classifier must beat).
sql/examples/ — simple query examples.

SQL — BigQuery ML (Phase-1 DS experiments)

sql/bqml/01_forecast_arima_plus.sql — E1 monthly revenue forecast per region.
sql/bqml/02_classifier_boosted_tree.sql — E2 line-level loss classifier.
sql/bqml/03_customer_loss_flag.sql — E3 forward-looking customer loss flag.

Lab

docker-compose.yml — local MySQL 8 on port 3307.
requirements.txt — pip deps for the clean + load scripts.

Quick start

# 1. Clone
git clone https://github.com/anushb1/wayfair-independent-study.git
cd wayfair-independent-study

# 2. Set up Python and profile the data
python3 -m venv .venv
.venv/bin/pip install pandas numpy
.venv/bin/python scripts/profile_dataset.py   # → workshops/profile_output.json
.venv/bin/python scripts/ds_profile.py        # → workshops/ds_profile_output.json

# 3. (Optional) spin up local MySQL and load the star schema
docker compose up -d
.venv/bin/pip install -r requirements.txt
.venv/bin/python scripts/clean_and_build_fact.py
.venv/bin/python scripts/load_mysql.py
mysql -h 127.0.0.1 -P 3307 -u root -p < sql/01_schema_mysql.sql
# then load aggregates: sql/02..07

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
docs		docs
scripts		scripts
sql		sql
workshops		workshops
.gitignore		.gitignore
0. erd.png		0. erd.png
EXPLORATION.md		EXPLORATION.md
GITHUB_SETUP.md		GITHUB_SETUP.md
Questions		Questions
README.md		README.md
ROLE_AND_SCOPE.md		ROLE_AND_SCOPE.md
STEP_BY_STEP.md		STEP_BY_STEP.md
categoryDim.csv		categoryDim.csv
cityDim.csv		cityDim.csv
countryDim.csv		countryDim.csv
customerDim.csv		customerDim.csv
docker-compose.yml		docker-compose.yml
factSales.csv		factSales.csv
geographyDim.csv		geographyDim.csv
orderDim.csv		orderDim.csv
productDim.csv		productDim.csv
regionDim.csv		regionDim.csv
requirements.txt		requirements.txt
segmentDim.csv		segmentDim.csv
shipModeDim.csv		shipModeDim.csv
stateDim.csv		stateDim.csv
subCategoryDim.csv		subCategoryDim.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Independent study — Google Cloud SE pilot for a fictional Wayfair-class retailer

The story arc (in the order it was built)

Headline findings (so you know what the pitch is actually selling)

Documentation map

Code / SQL map

Profiling (Python, pandas, no other deps)

SQL — pilot (Cloud SQL for MySQL)

SQL — BigQuery ML (Phase-1 DS experiments)

Lab

Quick start

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Independent study — Google Cloud SE pilot for a fictional Wayfair-class retailer

The story arc (in the order it was built)

Headline findings (so you know what the pitch is actually selling)

Documentation map

Code / SQL map

Profiling (Python, pandas, no other deps)

SQL — pilot (Cloud SQL for MySQL)

SQL — BigQuery ML (Phase-1 DS experiments)

Lab

Quick start

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages