Skip to content

ak-abdullah/Data-Science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Copenhagen Bicycle Market Analysis

Python Pandas scikit-learn Selenium Jupyter

Data science project analyzing Copenhagen's bicycle market using web-scraped business data, government population statistics, and geospatial analysis. Built regression and SVM models to forecast seasonal demand and identify high-potential areas for bicycle businesses.


⚡ Pipeline

  1. Data collection — scraped bicycle shops from Google Maps using Selenium across 6 Danish search queries. Collected name, rating, review count, category, address, phone, website, and GPS coordinates for each business.
  2. Data cleaning — handled missing ratings and review counts with median imputation, extracted 4-digit Danish postal codes from address strings using regex.
  3. Data merging — joined business data with Copenhagen population statistics from Statistics Denmark on postal code.
  4. Exploratory analysis — rating distributions, review count vs rating scatter, shops per postal code, category breakdowns, gender population breakdown by area.
  5. Geospatial visualization — mapped shop locations across Copenhagen using GeoPandas and Folium interactive heatmaps.
  6. Predictive modeling — regression and SVM models to forecast seasonal demand and market trends.

🛠️ Stack

Layer Technology
Web Scraping Selenium, BeautifulSoup
Data Processing pandas, NumPy
Geospatial Analysis GeoPandas, Shapely, Folium
Machine Learning scikit-learn (Regression, SVM)
Visualization Matplotlib, Seaborn
Data Sources Google Maps, Statistics Denmark, DBA.dk

📁 Project structure

data-science/
├── A 1/
│   └── Assignment 1/
│       ├── pandas_code.py          # initial data exploration
│       ├── process_yelp.ipynb      # Yelp dataset processing
│       └── schema.sql              # database schema
├── A 2/
│   ├── Code/
│   │   ├── google_maps_scraping.py  # Selenium scraper for Google Maps
│   │   └── clean_data.py           # data cleaning pipeline
│   └── Dataset for Safety/         # raw scraped datasets
├── A 2 Milestone 3/
│   ├── cleaning_and_conversion_and_unconverted_dataset/
│   │   ├── google_maps.py          # Google Maps data cleaning
│   │   ├── merging_data.py         # merge business and population data
│   │   ├── json_to_csv.py          # DBA.dk JSON to CSV conversion
│   │   └── extracting_population.py
│   ├── Datasets/
│   │   ├── google_maps.csv
│   │   ├── Copenhagen_Population.xlsx
│   │   └── merged_business_population.csv
│   └── copenhagen_analysis.ipynb   # full analysis notebook
└── README.md

🚀 Running locally

pip install pandas numpy scikit-learn selenium geopandas shapely folium matplotlib seaborn webdriver-manager openpyxl

For the scraper:

python "A 2/Code/google_maps_scraping.py"

For the full analysis:

cd "A 2 Milestone 3"
jupyter notebook copenhagen_analysis.ipynb

💡 What I learned building this

Scraping Google Maps with Selenium is tricky. The page loads results dynamically and class names change between sessions. I handled stale element exceptions with a retry loop and used scroll detection to know when all results were loaded. The scraper ran 6 different Danish search queries to cover all bicycle-related business categories.

Merging datasets on postal code sounds simple but Danish addresses store postal codes inconsistently — sometimes embedded in a full address string, sometimes standalone. I used regex to extract the first 4-digit sequence from each address field before merging.

Geospatial analysis showed clear clustering of bicycle businesses in central Copenhagen postal codes. Population density alone does not predict shop density — tourist areas and cycling infrastructure proximity matter more.


📬 Contact

Built by Abdullah Khalid

LinkedIn Email Portfolio

About

Copenhagen bicycle market analysis using web-scraped Google Maps data and population statistics. Geospatial clustering, demand forecasting with regression and SVM.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors