This project analyzes recent Computer Science (CS) job salary data to figure out the best CS jobs to pursue.
This project is a data analysis walkthrough presented as a Jupyter Notebook analyzing recent CS job salary data (2025) to figure out the best CS jobs to pursue, created as a cumulative open-ended project for the 'Intermediate Programming with Data' course offered at Northeastern. I utilize various Python libraries and Data Science techniques to analyze the 'AI, ML, Data Science Salary (2020- 2025)' dataset from Kaggle, which contains salary and Employment trends in AI, ML, and Data Science from the past 5 years. In context, I define success based on: 1) Popularity: how common the job is, 2) Salary, how much the job pays. At the end of the notebook, I summarize include a summary of my results.
- Python: The core language for the backend, handling data processing and business logic.
- pandas: A data analysis and manipulation library, likely used for handling job listing data.
- collections: Keeps count of values.
- math: Provides basic mathematical functions.
- numpy: Creates weighted vectors and sums for calculations.
- matplotlib: Creates visual graph representations.
- sklearn: Provides linear regression and K-means clustering algorithms.
- seaborn: Produces visually appealing scatter plots.
- To view my code and report as a Jupyter Notebook, download Project_Report.ipynb.
- To view my notebook as a raw html file, download Project_Report.html.
- Dataset Link: https://www.kaggle.com/datasets/samithsachidanandan/the-global-ai-ml-data-science-salary-for-2025/data
- MLA citation: Samith Chimminiyan. “The AI, ML, Data Science Salary (2020- 2025).” Kaggle.com, 2020, www.kaggle.com/datasets/samithsachidanandan/the-global-ai-ml-data-science-salary-for-2025/data. Accessed 10 Apr. 2025.
- I pulled inspiration for weighted sums from the following paper: https://www.vldb.org/pvldb/vol16/p2377-chen.pdf
- MLA citation: Chen, Zixuan, et al. “Why Not Yet: Fixing a Top-k Ranking That Is Not Fair to Individuals.” Proceedings of the VLDB Endowment, vol. 16, no. 9, May 2023, pp. 2377–2390, www.vldb.org/pvldb/vol16/p2377-chen.pdf, https://doi.org/10.14778/3598581.3598606. Accessed 10 Apr. 2025.
- Various code snippets were utilized from all our course labs, particularly labs 5 and 11 of the 'Intermediate Programming with Data' course offered at Northeastern.