Skip to content

The purpose of this project is to build an end-to-end ETL Pipeline using weather data.

Notifications You must be signed in to change notification settings

dkarakost/Weatherproject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Weather API - ETL Data Pipeline on Google Cloud Platform

Project Overview

This repository showcases my implementation of an end-to-end ETL pipeline, leveraging Google Cloud Platform (GCP) to manage and analyze weather data. This project emphasizes my capabilities in using cloud technologies and data engineering practices to handle real-world data workflows effectively.

Project Architecture:

image

Skills Demonstrated

  • API Interaction: Automated data extraction from OpenWeather API.
  • Data Manipulation: Utilized Python and Pandas to clean and transform data.
  • Cloud Services: Deployed automated ETL workflows to extract, process, and store data on production using GCP services like Cloud Functions, Cloud Scheduler, BigQuery, and Cloud Storage.
  • Data Visualization: Created an interactive dashboard using Looker Studio.

image

Technologies Used

  • Python 3.11, Pandas
  • Google Cloud Platform (GCP)
  • BigQuery, Google Cloud Storage, Cloud Functions, Cloud Scheduler
  • Looker Studio

Project Files

  • main.py: Contains the full code for data extraction, transformation, and loading.
  • settings.py: Used to configure API keys, GCP service account files, project ID, and dataset ID.
  • requirements.txt: Lists all the dependencies necessary to run main.py.

Setup and Installation

Prerequisites

  • An account with OpenWeather to access the WeatherAPI. Find more here
  • A free trial account on Google Cloud Platform to use cloud services. Find more here

Configuration

  1. Clone the repository git clone https://github.com/yourgithubusername/weather-data-etl-pipeline.git

  2. Create a virtual environment python3 -m venv venv

  3. Activate your virtual environment . venv/bin/activate

  4. Install required Python libraries pip install -r requirements.txt

  5. Set up settings.py

Running the Project

To execute the ETL pipeline, follow these steps:

  1. API Data Extraction: Set up your API key and the service account files in settings.py.
  2. Data Transformation and Storage: Run main.py to process the extracted data. You should have an active GCP account to store data in Cloud Storage and BigQuery
  3. Automation: Deploy your script on Cloud Functions and create Cloud Scheduler job to fully automate and schedule the ETL process. Find more information here

Contributions

Contributions to this project are welcome! Please fork this repository and submit a pull request with your proposed changes.

Acknowledgments

This project is provided by DataProjects.io, a platform that helps data professionals build a portfolio of real-world, end-to-end projects on the cloud.

You can find the complete project along with detailed instructions here!

License

This project is licensed under the Mozilla Public License 2.0 - see the LICENSE file for details.

About

The purpose of this project is to build an end-to-end ETL Pipeline using weather data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages