Copenhagen Bicycle Market Analysis

Data science project analyzing Copenhagen's bicycle market using web-scraped business data, government population statistics, and geospatial analysis. Built regression and SVM models to forecast seasonal demand and identify high-potential areas for bicycle businesses.

⚡ Pipeline

Data collection — scraped bicycle shops from Google Maps using Selenium across 6 Danish search queries. Collected name, rating, review count, category, address, phone, website, and GPS coordinates for each business.
Data cleaning — handled missing ratings and review counts with median imputation, extracted 4-digit Danish postal codes from address strings using regex.
Data merging — joined business data with Copenhagen population statistics from Statistics Denmark on postal code.
Exploratory analysis — rating distributions, review count vs rating scatter, shops per postal code, category breakdowns, gender population breakdown by area.
Geospatial visualization — mapped shop locations across Copenhagen using GeoPandas and Folium interactive heatmaps.
Predictive modeling — regression and SVM models to forecast seasonal demand and market trends.

🛠️ Stack

Layer	Technology
Web Scraping	Selenium, BeautifulSoup
Data Processing	pandas, NumPy
Geospatial Analysis	GeoPandas, Shapely, Folium
Machine Learning	scikit-learn (Regression, SVM)
Visualization	Matplotlib, Seaborn
Data Sources	Google Maps, Statistics Denmark, DBA.dk

📁 Project structure

data-science/
├── A 1/
│   └── Assignment 1/
│       ├── pandas_code.py          # initial data exploration
│       ├── process_yelp.ipynb      # Yelp dataset processing
│       └── schema.sql              # database schema
├── A 2/
│   ├── Code/
│   │   ├── google_maps_scraping.py  # Selenium scraper for Google Maps
│   │   └── clean_data.py           # data cleaning pipeline
│   └── Dataset for Safety/         # raw scraped datasets
├── A 2 Milestone 3/
│   ├── cleaning_and_conversion_and_unconverted_dataset/
│   │   ├── google_maps.py          # Google Maps data cleaning
│   │   ├── merging_data.py         # merge business and population data
│   │   ├── json_to_csv.py          # DBA.dk JSON to CSV conversion
│   │   └── extracting_population.py
│   ├── Datasets/
│   │   ├── google_maps.csv
│   │   ├── Copenhagen_Population.xlsx
│   │   └── merged_business_population.csv
│   └── copenhagen_analysis.ipynb   # full analysis notebook
└── README.md

🚀 Running locally

pip install pandas numpy scikit-learn selenium geopandas shapely folium matplotlib seaborn webdriver-manager openpyxl

For the scraper:

python "A 2/Code/google_maps_scraping.py"

For the full analysis:

cd "A 2 Milestone 3"
jupyter notebook copenhagen_analysis.ipynb

💡 What I learned building this

Scraping Google Maps with Selenium is tricky. The page loads results dynamically and class names change between sessions. I handled stale element exceptions with a retry loop and used scroll detection to know when all results were loaded. The scraper ran 6 different Danish search queries to cover all bicycle-related business categories.

Merging datasets on postal code sounds simple but Danish addresses store postal codes inconsistently — sometimes embedded in a full address string, sometimes standalone. I used regex to extract the first 4-digit sequence from each address field before merging.

Geospatial analysis showed clear clustering of bicycle businesses in central Copenhagen postal codes. Population density alone does not predict shop density — tourist areas and cycling infrastructure proximity matter more.

📬 Contact

Built by Abdullah Khalid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Copenhagen Bicycle Market Analysis

⚡ Pipeline

🛠️ Stack

📁 Project structure

🚀 Running locally

💡 What I learned building this

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
A 1		A 1
A 2 Milestone 3		A 2 Milestone 3
A 2		A 2
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Copenhagen Bicycle Market Analysis

⚡ Pipeline

🛠️ Stack

📁 Project structure

🚀 Running locally

💡 What I learned building this

📬 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages