The primary goal of this repository is to conduct exploratory data analysis (EDA) on COVID-19 data using SQL code and Snowflake. The code aims to extract valuable insights from the provided dataset, allowing users to better understand trends, patterns, and key metrics related to the pandemic from the start on 2020 until February 2024.
This SQL code is designed to be executed on Snowflake, a cloud-based data warehousing platform. Snowflake provides scalable and flexible data storage and processing capabilities, making it suitable for handling large datasets with efficiency.
The COVID-19 data used in this analysis is sourced from Our World in Data. The dataset includes comprehensive information on COVID-19 cases, deaths, vaccinations, and related metrics.
The data is initially presented in a single table. However, to demonstrate skills related to SQL data manipulation, it has been split into two main tables for this analysis. The modified schemas are accessible in the schemas.md file.
The COVID-19 data is organized into a Snowflake database with specific schemas and tables. Here is an overview of the data organization:
- Database Name:
COVID_DATABASE - Schemas:
information_schema: Views describing the contents of schemas in the database.public: Contains publicly accessible data and information.
-
public.coviddeaths- Contains information about COVID-19 cases and deaths.
-
public.covidvaccinations- Stores data related to COVID-19 testing and vaccinations.
The SQL code is divided into four main parts, each focusing on a specific aspect of the COVID-19 data. Here is a brief explanation of each part along with the key questions addressed:
-
Data Overview (
00_COVID_Data_Overview.sql):- Provides a general visualization of the two tables in the COVID-19 dataset.
-
Brazilian Data (
01_Brazil_Data_Analysis.sql):- Analyzes temporal trends of COVID-19 cases, testing and vaccination in Brazil.
- Key Questions:
- Q1: How has the likelihood of contracting COVID in Brazil evolved?
- Q2: After contracting COVID, how has the likelihood of death evolved in Brazil?
- Q3: How has the number of vaccinations evolved in Brazil?
- Q4: Does vaccination impact the number of deaths per 100 cases in Brazil?
- Q5: Does vaccination impact the number of new cases reported in Brazil?
-
World Data (
03_World_Data_Analysis.sql):- Analyzes temporal trends of COVID-19 cases in the world.
- Key Questions:
- Q1: What are the countries with the highest infection rate by population?
- Q2: What are the countries with the highest death count?
- Q3: What are the countries with the highest death rate by population?
- Q4: Which continent has the highest death count?
- Q5: What is the global death percentage over infect cases?
-
Metric Comparison (
03_Vaccination_Data_Analysis.sql):- Gain insights into the global patterns of COVID-19 vaccination.
- Key Questions:
- Q1: How are new vaccines concentrated in the pandemic across the world?
- Q2: How have vaccination numbers evolved in the world?
- Q3: Which countries have the highest number of vaccines applied per capita?
The "queries-results.pdf" file provides a comprehensive overview of the findings derived from the COVID-19 data analysis. The document presents the answers for the questions outlined in the README file, based on visualizations and charts generated through Snowflake queries and tools.
- Create a free trial Snowflake account if you don't have one.
- Download the COVID data from Our World in Data.
- Create the database inside Snowflake with two tables as described in the
schemas.mdfile. - Execute the SQL code for the desired analysis.
- Review the results and visualizations generated by the queries.
Feel free to customize the queries or adapt the code to meet specific analysis requirements.
This dataset has a lot of potential data for future and further data analysis, for example:
- Investigate the correlation between various socioeconomic factors and health outcomes related to COVID-19.