This repository showcases my implementation of an end-to-end ETL pipeline, leveraging Google Cloud Platform (GCP) to manage and analyze weather data. This project emphasizes my capabilities in using cloud technologies and data engineering practices to handle real-world data workflows effectively.
- API Interaction: Automated data extraction from OpenWeather API.
- Data Manipulation: Utilized Python and Pandas to clean and transform data.
- Cloud Services: Deployed automated ETL workflows to extract, process, and store data on production using GCP services like Cloud Functions, Cloud Scheduler, BigQuery, and Cloud Storage.
- Data Visualization: Created an interactive dashboard using Looker Studio.
- Python 3.11, Pandas
- Google Cloud Platform (GCP)
- BigQuery, Google Cloud Storage, Cloud Functions, Cloud Scheduler
- Looker Studio
main.py: Contains the full code for data extraction, transformation, and loading.settings.py: Used to configure API keys, GCP service account files, project ID, and dataset ID.requirements.txt: Lists all the dependencies necessary to runmain.py.
- An account with OpenWeather to access the WeatherAPI. Find more here
- A free trial account on Google Cloud Platform to use cloud services. Find more here
-
Clone the repository
git clone https://github.com/yourgithubusername/weather-data-etl-pipeline.git -
Create a virtual environment
python3 -m venv venv -
Activate your virtual environment
. venv/bin/activate -
Install required Python libraries
pip install -r requirements.txt -
Set up
settings.py
- Set your OpenWeather API Key.
- Configure service account files for Cloud Storage and BigQuery access. Instructions for setting up:
- Set your GCP project ID and dataset ID. Learn how to create a new GCP project here.
To execute the ETL pipeline, follow these steps:
- API Data Extraction: Set up your API key and the service account files in
settings.py. - Data Transformation and Storage: Run
main.pyto process the extracted data. You should have an active GCP account to store data in Cloud Storage and BigQuery - Automation: Deploy your script on Cloud Functions and create Cloud Scheduler job to fully automate and schedule the ETL process. Find more information here
Contributions to this project are welcome! Please fork this repository and submit a pull request with your proposed changes.
This project is provided by DataProjects.io, a platform that helps data professionals build a portfolio of real-world, end-to-end projects on the cloud.
You can find the complete project along with detailed instructions here!
This project is licensed under the Mozilla Public License 2.0 - see the LICENSE file for details.

