Skip to content

anushb1/wayfair-independent-study

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Independent study — Google Cloud SE pilot for a fictional Wayfair-class retailer

A Google Cloud Solutions Engineer independent study: take a retail star-schema dataset and drive it end-to-end through pre-sales discovery, a two-track workshop sequence (Business Analytics + Data Science), a signed-off KPI contract, a ranked ML experiment backlog, a GCP reference architecture, and a pitch deck.

Read first: ROLE_AND_SCOPE.md — external SE, fictional retailer as customer, GCP only.


The story arc (in the order it was built)

  1. Role & dataROLE_AND_SCOPE.md, EXPLORATION.md.
  2. Local labSTEP_BY_STEP.md: Docker MySQL → load star schema → run SQL.
  3. Week-2 BA workshopworkshops/01_BA_WORKSHOP_BRIEF.mdworkshops/02_BA_WORKSHOP_FINDINGS.md → signed-off workshops/03_PROBLEM_AND_KPIS.md.
  4. Week-2 DS workshop (parallel track) — workshops/04_DS_WORKSHOP_BRIEF.mdworkshops/05_DS_WORKSHOP_FINDINGS.md → committed workshops/06_ML_HYPOTHESES_AND_EXPERIMENTS.md.
  5. GCP reference architecturedocs/CLOUD_ARCHITECTURE.md: Cloud SQL pilot + BigQuery/BQML analytics, with every design choice traced back to a workshop finding.
  6. Pitchdocs/PITCH_DECK.md: 12-slide narrative for the Director of Analytics.
  7. First-pitch one-pagerdocs/ONE_PAGER.md: CFO-readable, dollar-led summary that travels through the customer's org without you. See docs/assets/ONE_PAGER.png for the visual.

Suggested first pass: ROLE_AND_SCOPE → skim EXPLORATIONworkshops/03_PROBLEM_AND_KPISworkshops/06_ML_HYPOTHESES_AND_EXPERIMENTSdocs/CLOUD_ARCHITECTUREdocs/PITCH_DECKdocs/ONE_PAGER.


Headline findings (so you know what the pitch is actually selling)

Produced from the CSVs in this repo by scripts/profile_dataset.py and scripts/ds_profile.py — re-runnable any time.

  • West earns 42% of revenue and 56% of profit; Central earns 17% of revenue but only 2.6% of profit.
  • Discount cliff at 20%: below 20% discount, 0.6% of lines lose money; at ≥30% discount, 96.3% of lines lose money. 70.7% of all negative-profit lines come from the ≥30% band.
  • Growth is real: +11.5% → +25.6% → +18.4% YoY sales across 2014–2017.
  • Seasonality is a 6.9× peak/trough ratio (Nov peak, Feb trough) — monthly forecasting is viable.
  • 19% of customers have negative lifetime profit — a concrete, model-able segment.

Documentation map

Document What it is for
ROLE_AND_SCOPE.md Canonical role & scope — who you are, who the customer is, pre/post-sales, no ambiguity
EXPLORATION.md Discovery notes, business scenarios (migration + engagement), GCP service anchors, data/SQL reference
STEP_BY_STEP.md Lab steps — Docker → clean/load → run SQL
workshops/ Week-2 BA + DS workshop sequence (briefs → findings → signed-off artifacts)
docs/CLOUD_ARCHITECTURE.md Full GCP reference architecture with traceability to workshops
docs/PITCH_DECK.md 12-slide pre-sales narrative for the Director of Analytics
docs/ONE_PAGER.md First-pitch one-pager — CFO-readable, dollar-led, leave-behind from the intro meeting (rendered: docs/assets/ONE_PAGER.png)
GITHUB_SETUP.md Original repo-creation notes
Questions Original brainstorm

Code / SQL map

Profiling (Python, pandas, no other deps)

SQL — pilot (Cloud SQL for MySQL)

SQL — BigQuery ML (Phase-1 DS experiments)

Lab


Quick start

# 1. Clone
git clone https://github.com/anushb1/wayfair-independent-study.git
cd wayfair-independent-study

# 2. Set up Python and profile the data
python3 -m venv .venv
.venv/bin/pip install pandas numpy
.venv/bin/python scripts/profile_dataset.py   # → workshops/profile_output.json
.venv/bin/python scripts/ds_profile.py        # → workshops/ds_profile_output.json

# 3. (Optional) spin up local MySQL and load the star schema
docker compose up -d
.venv/bin/pip install -r requirements.txt
.venv/bin/python scripts/clean_and_build_fact.py
.venv/bin/python scripts/load_mysql.py
mysql -h 127.0.0.1 -P 3307 -u root -p < sql/01_schema_mysql.sql
# then load aggregates: sql/02..07

About

Pre-sales pilot for a fictional Wayfair-class retailer: BA + DS workshops, signed-off KPI contract, ML experiment backlog, GCP reference architecture (Cloud SQL + BigQuery/BQML + Looker), and a 12-slide pitch deck.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors