Data Science Job Salary Analysis

Project Link

Data Science Job Salary Analysis Project

Interactive Project Link

Interactive Session

Presentation Link

Introduction

In the era of big data and technological advancement, the role of data science has become integral to driving innovation, efficiency, and strategic decision-making across diverse industries. As organizations increasingly rely on data-driven insights, the demand for skilled data scientists has surged, leading to a competitive job market where salary structures play a crucial role. This report undertakes a comprehensive exploration into the factors influencing data science salaries, aiming to uncover patterns, trends, and geographical variations that define compensation packages in this dynamic field.

Motivation

The motivation behind this in-depth analysis lies in addressing the growing curiosity and necessity surrounding data science salaries. For aspiring data scientists, understanding the key determinants of compensation is essential in shaping career trajectories and making informed choices regarding skill development. Simultaneously, employers and industry stakeholders seek insights into the factors that attract and retain top-tier data science talent in order to remain competitive and innovative.

The field of data science is not static; it evolves with technological advancements, industry demands, and methodological innovations. Consequently, the motivation for this report is to offer a nuanced perspective on the salary landscape, moving beyond a superficial examination to delve into the specific factors that contribute to earning differentials within the profession. Additionally, a particular emphasis will be placed on exploring which states and cities within the United States, as well as countries globally, offer the highest-paying data science roles. By doing so, this report aims to provide a comprehensive understanding of the regional dynamics shaping data science salaries, offering valuable insights for both professionals and employers navigating the dynamic landscape of data science compensation.

In my preliminary attempt to predict salaries according to job descriptions, this report takes a step further by examining how various factors influence compensation within the dynamic realm of data science. This initial exploration sets the stage for a more comprehensive understanding of salary determinants and aims to contribute valuable insights for both professionals and employers navigating the intricate landscape of data science compensation.

Data Sources

Data Source 1: Glassdoor Jobs Dataset 2017

Data URL: Glassdoor Jobs Dataset
Data Type: CSV format
Total Datasize: 741
Dataset Year: 2017

Key Attributes:

Job Title
Salary Estimate
Job Description
Rating
Company Name
Location
Headquarters
Size
Founded
Type of Ownership
Industry
Sector
Revenue
Competitors
Hourly
Employer Provided
Min Salary, Max Salary, Avg Salary
Company Text
Job State
Same State
Age
Python, R, Spark, AWS, Excel (Skills Indicators)

Data Source 2: Data Science Salaries (2020-2023)

Data URL: Data Science Salaries
Data Type: CSV format
Total Datasize: 3755
Dataset Year: 2020-2023

Variables:

Work Year
Experience Level
Employment Type
Job Title
Salary
Salary Currency
Salary in USD
Employee Residence
Remote Ratio
Company Location
Company Size

This comprehensive dataset offers a wealth of information for analysis and exploration, providing valuable insights into trends and patterns in the job market within specified timeframes and regions.

Usage

To use this project, follow these steps:

Clone the Repository:

git clone https://github.com/arpita739/made-template.git
cd made-template

Run the Pipeline Script:
- Before running the pipeline script, ensure you have the required dependencies installed.
```
pip install -r requirements.txt
```
- Execute the following command to run the pipeline script and download the datasets from Kaggle.
```
bash pipeline.sh
```
The script will handle the download and extraction of datasets.
Explore the Jupyter Notebooks:
- After downloading the datasets, explore the analysis by running the provided Jupyter Notebooks.
```
jupyter notebook
```
Modify and Contribute:
- Feel free to modify the analysis or extend it according to your needs.
- If you make improvements, consider contributing back by submitting a pull request.

Please note that you need to have a Kaggle account and API key configured on your system for the pipeline script to work correctly. Refer to the Kaggle API documentation for more information on setting up your credentials: Kaggle API Documentation.

Special Thanks to Our Tutors

I want to express my heartfelt gratitude to our tutors, Philip Heltweg and Georg Schwarz, for their unwavering guidance and support throughout every phase of this project. Their unparalleled expertise and profound insights have played a pivotal role in shaping my approach and methodologies.

This project stands as a testament to the invaluable mentorship provided by Philip and Georg. Their encouragement, constructive feedback, and dedication have been the driving force behind the successful completion of this endeavor. I am truly thankful for the privilege of learning under their mentorship.

This project would not have been possible without the exceptional contributions of Philip and Georg. Their commitment to fostering learning and excellence has left an indelible mark on this journey, and for that, I am sincerely grateful.

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
.github/workflows		.github/workflows
.idea		.idea
data		data
dataset-folder		dataset-folder
examples		examples
exercises		exercises
figures		figures
project		project
.gitignore		.gitignore
Data-Science-Studium.jpeg		Data-Science-Studium.jpeg
LICENSE		LICENSE
Presentation_MADE.pdf		Presentation_MADE.pdf
Presentation_MADE.pptx		Presentation_MADE.pptx
README.md		README.md
airports.sqlite		airports.sqlite
trainstop.sqlite		trainstop.sqlite
~$Presentation_MADE.pptx		~$Presentation_MADE.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Job Salary Analysis

Project Link

Interactive Project Link

Presentation Link

Introduction

Motivation

Data Sources

Data Source 1: Glassdoor Jobs Dataset 2017

Key Attributes:

Data Source 2: Data Science Salaries (2020-2023)

Variables:

Usage

Special Thanks to Our Tutors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Science Job Salary Analysis

Project Link

Interactive Project Link

Presentation Link

Introduction

Motivation

Data Sources

Data Source 1: Glassdoor Jobs Dataset 2017

Key Attributes:

Data Source 2: Data Science Salaries (2020-2023)

Variables:

Usage

Special Thanks to Our Tutors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages