Skip to content

shanskarBansal/SalaryLens-Showcase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation


Python Streamlit Scikit-Learn Pandas Matplotlib License


Typing SVG

SalaryLens is an intelligent salary prediction & exploration platform that uses machine learning to deliver data-driven compensation insights for software engineers worldwide. Trained on the Stack Overflow Developer Survey 2023 with 30,000+ real-world data points.


ย  ย  ย  ย  ย 


โ“ The Problem

Every developer eventually asks: "Am I being paid fairly?"

The global tech job market is a maze of conflicting salary data. Developers routinely face:

Pain Point Impact
๐Ÿ’ธ No personalized benchmarks Accepting below-market offers
๐ŸŒ Cross-border salary fog Poor relocation & remote work decisions
๐ŸŽ“ Unclear ROI on education Over/under-investing in degrees
โณ Experience โ‰  automatic raises No visibility into career trajectory
flowchart LR
    A["๐Ÿง‘โ€๐Ÿ’ป YOU"] --> B["๐ŸŒ Country\n๐ŸŽ“ Education\nโณ Experience"]
    B --> C["๐Ÿค– SalaryLens\nML Engine"]
    C --> D["๐Ÿ’ฐ Predicted\nSalary"]
Loading

SalaryLens turns that ??? into a precise, data-backed answer โ€” instantly.


โœจ Features at a Glance

mindmap
  root((SalaryLens))
    ๐Ÿ”ฎ Predict
      14 Countries
      4 Education Levels
      0โ€“50 Years Experience
      Instant USD Estimate
    ๐Ÿ“Š Explore
      Country Distribution Pie Chart
      Mean Salary by Country Bar Chart
      Salary vs Experience Line Chart
    ๐Ÿค– ML Engine
      Decision Tree Regressor
      GridSearchCV Tuned
      Label Encoded Features
    ๐Ÿงน Data Pipeline
      65K Raw Responses
      30K Cleaned Records
      Outlier Capping
      Category Consolidation
Loading

๐Ÿ”ฎ Prediction Engine

  • Pick from 14 countries worldwide
  • Choose your education tier (4 levels)
  • Slide to set years of experience (0โ€“50)
  • Hit "Calculate Salary" โ†’ instant result in USD
  • Expandable "Why fill this out?" section with context

๐Ÿ“Š Exploration Dashboard

  • Pie chart โ€” developer distribution by country
  • Bar chart โ€” mean salary head-to-head by nation
  • Line chart โ€” how salary scales with experience
  • Powered by Matplotlib + Streamlit for interactivity

๐Ÿ“ธ See It in Action

๐Ÿ–ฅ๏ธ Predict Page โ€” Get Your Estimate
Prediction Interface Full Application View

How it works: Select your country, education level, and years of experience โ†’ click Calculate Salary โ†’ get your predicted compensation instantly.

Interface highlights:

  • ๐ŸŒ Dropdown with 14 supported countries
  • ๐ŸŽ“ Education level from "Less than a Bachelors" to "Post grad"
  • ๐Ÿ“… Smooth slider for experience (0โ€“50 years)
  • โ„น๏ธ Collapsible explainer: "Knowing the market rate for your skills can help you negotiate better salaries"
๐ŸŒ Explore โ€” Where Are Developers?

Distribution highlights from the Stack Overflow 2023 Survey:

๐Ÿ‡บ๐Ÿ‡ธ United States  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  36.9%
๐Ÿ‡ฌ๐Ÿ‡ง United Kingdom โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ                            10.8%
๐Ÿ‡ฎ๐Ÿ‡ณ India          โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ                              9.7%
๐Ÿ‡ฉ๐Ÿ‡ช Germany        โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ                                 6.5%
๐Ÿ‡ซ๐Ÿ‡ท France         โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ                                  6.0%
๐Ÿ‡จ๐Ÿ‡ฆ Canada         โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ                                   5.0%
๐Ÿ‡ง๐Ÿ‡ท Brazil         โ–ˆโ–ˆโ–ˆโ–ˆ                                    4.1%
๐Ÿ‡ณ๐Ÿ‡ฑ Netherlands    โ–ˆโ–ˆโ–ˆ                                     3.4%
๐Ÿ‡ช๐Ÿ‡ธ Spain          โ–ˆโ–ˆโ–ˆ                                     3.3%
๐Ÿ‡ต๐Ÿ‡ฑ Poland         โ–ˆโ–ˆโ–ˆ                                     3.2%
๐Ÿ‡ฆ๐Ÿ‡บ Australia      โ–ˆโ–ˆโ–ˆ                                     2.9%
๐Ÿ‡ฎ๐Ÿ‡น Italy          โ–ˆโ–ˆโ–ˆ                                     2.8%
๐Ÿ‡ธ๐Ÿ‡ช Sweden         โ–ˆโ–ˆ                                      2.6%
๐Ÿ‡ท๐Ÿ‡บ Russia         โ–ˆโ–ˆ                                      2.6%

๐Ÿ’ก Insight: The US alone accounts for more than a third of all respondents, which gives the model very strong signal for American salary predictions.

๐Ÿ’ฐ Explore โ€” Salary by Country

Tier Countries Avg. Salary Range
๐Ÿฅ‡ Top United States $120,000+
๐Ÿฅˆ High Australia, Canada, Germany $65K โ€“ $77K
๐Ÿฅ‰ Mid UK, Netherlands, Sweden, France $50K โ€“ $70K
๐Ÿ… Emerging India, Brazil, Poland, Russia $28K โ€“ $40K

๐Ÿ’ก Insight: A developer in the US earns ~4x what the same profile earns in India or Brazil โ€” geography remains the single biggest salary factor.

๐Ÿ“ˆ Explore โ€” Salary by Experience

Career stages decoded:

Stage Years Typical Salary Growth Rate
๐ŸŒฑ Entry 0โ€“3 $55K โ€“ $70K ๐Ÿš€ Fastest growth
๐Ÿ“ˆ Growth 4โ€“10 $75K โ€“ $95K โšก Steep upward curve
๐Ÿ”๏ธ Plateau 11โ€“35 $95K โ€“ $120K ๐Ÿ“Š Steady, gradual
๐Ÿ‘‘ Senior 35โ€“50 $120K โ€“ $180K ๐Ÿ” Peaks with volatility

๐Ÿ’ก Insight: The steepest salary jump happens in the first 10 years. After that, specialization, leadership, and geography matter more than raw experience.


๐Ÿง  Under the Hood

Data โ†’ Model โ†’ Prediction

flowchart LR
    subgraph DATA ["๐Ÿ“ฅ Data Layer"]
        A[Stack Overflow\nSurvey 2023\n~65K responses]
    end
    subgraph PROCESS ["๐Ÿงน Processing"]
        B[Filter Full-Time\nEmployees]
        C[Handle Nulls\n& Outliers]
        D[Encode Categories\nLabelEncoder]
    end
    subgraph ML ["๐Ÿค– ML Layer"]
        E[Train 3 Models\nLinReg / DTree / RF]
        F[GridSearchCV\nHyperparameter Tuning]
        G[Best Model\nDecision Tree]
    end
    subgraph APP ["๐ŸŒ App Layer"]
        H[Streamlit\nWeb Interface]
        I[๐Ÿ’ฐ Prediction]
        J[๐Ÿ“Š Exploration]
    end
    A --> B --> C --> D --> E --> F --> G --> H
    H --> I
    H --> J
Loading

Pipeline Breakdown

# Stage What Happens Key Details
1๏ธโƒฃ Ingest Load raw CSV from Stack Overflow 65,000 survey responses
2๏ธโƒฃ Filter Keep only full-time employed devs Drops part-time, freelance, unemployed
3๏ธโƒฃ Clean Remove nulls, cap salary outliers Range: $10K โ€“ $250K
4๏ธโƒฃ Consolidate Group rare countries into "Other" Threshold: 400+ responses to keep
5๏ธโƒฃ Engineer Standardize education into 4 tiers Map 15+ raw categories โ†’ 4
6๏ธโƒฃ Encode LabelEncoder for country & education Numeric representation for ML
7๏ธโƒฃ Train Fit 3 regression algorithms Compare RMSE across models
8๏ธโƒฃ Tune GridSearchCV on Decision Tree Optimize max_depth parameter
9๏ธโƒฃ Serialize Pickle model + encoders saved_steps.pkl for production
๐Ÿ”Ÿ Deploy Serve via Streamlit Two-page app: Predict + Explore

๐Ÿ—๏ธ Project Architecture

SalaryLens/
โ”‚
โ”œโ”€โ”€ ๐Ÿ app.py                    โ† Entry point โ€” sidebar navigation
โ”‚   โ”œโ”€โ”€ ๐Ÿ”ฎ predict_page.py       โ† Prediction UI + model inference
โ”‚   โ””โ”€โ”€ ๐Ÿ“Š explore_page.py       โ† Data viz: pie, bar, line charts
โ”‚
โ”œโ”€โ”€ ๐Ÿ““ SalaryPrediction.ipynb    โ† Full ML experimentation notebook
โ”œโ”€โ”€ ๐Ÿ“ฆ saved_steps.pkl           โ† Serialized model + label encoders
โ””โ”€โ”€ ๐Ÿ“ survey_results_public.csv โ† Stack Overflow raw dataset
graph TD
    A["๐Ÿ app.py"] -->|Sidebar: Predict| B["๐Ÿ”ฎ predict_page.py"]
    A -->|Sidebar: Explore| C["๐Ÿ“Š explore_page.py"]
    B --> D["๐Ÿ“ฆ saved_steps.pkl"]
    D --> E["DecisionTreeRegressor"]
    D --> F["LabelEncoder ร— 2"]
    E --> G["๐Ÿ’ฐ Predicted Salary"]
    C --> H["๐Ÿ“ survey_results_public.csv"]
    H --> I["๐Ÿฅง Pie: Country Distribution"]
    H --> J["๐Ÿ“Š Bar: Salary by Country"]
    H --> K["๐Ÿ“ˆ Line: Salary by Experience"]

    style A fill:#58a6ff,stroke:#1f6feb,color:#0d1117
    style G fill:#3fb950,stroke:#238636,color:#0d1117
    style I fill:#d2a8ff,stroke:#8b5cf6,color:#0d1117
    style J fill:#d2a8ff,stroke:#8b5cf6,color:#0d1117
    style K fill:#d2a8ff,stroke:#8b5cf6,color:#0d1117
Loading

๐Ÿค– Model Showdown

Three algorithms went head-to-head on the same data:

Model RMSE (USD) Verdict
๐Ÿ“ Linear Regression ~$30,500 โšช Solid baseline, underfits complex patterns
๐ŸŒณ Decision Tree $30,428 ๐ŸŸข Winner โ€” best bias-variance tradeoff after tuning
๐ŸŒฒ Random Forest $29,487 ๐ŸŸก Lowest training error, but overfitting risk

๐Ÿ† Decision Tree Regressor was selected โ€” tuned via GridSearchCV with max_depth โˆˆ {None, 2, 4, 6, 8, 10, 12} using neg_mean_squared_error scoring.

๐Ÿ”ง Why not Random Forest?

While Random Forest achieved a lower RMSE on the training set ($29,487 vs $30,428), the Decision Tree with optimized depth showed better generalization to unseen data. The small RMSE gap (~$1K) didn't justify the added complexity and overfitting risk of the ensemble approach for this feature set of only 3 input variables.


๐ŸŒ Coverage Map

๐Ÿ‡บ๐Ÿ‡ธ United States ๐Ÿ‡ฎ๐Ÿ‡ณ India ๐Ÿ‡ฌ๐Ÿ‡ง United Kingdom ๐Ÿ‡ฉ๐Ÿ‡ช Germany ๐Ÿ‡จ๐Ÿ‡ฆ Canada
๐Ÿ‡ง๐Ÿ‡ท Brazil ๐Ÿ‡ซ๐Ÿ‡ท France ๐Ÿ‡ช๐Ÿ‡ธ Spain ๐Ÿ‡ฆ๐Ÿ‡บ Australia ๐Ÿ‡ณ๐Ÿ‡ฑ Netherlands
๐Ÿ‡ต๐Ÿ‡ฑ Poland ๐Ÿ‡ฎ๐Ÿ‡น Italy ๐Ÿ‡ท๐Ÿ‡บ Russia ๐Ÿ‡ธ๐Ÿ‡ช Sweden

Education tiers supported:

Code Level Includes
L1 ๐Ÿ“š Less than a Bachelors High school, associate degree, bootcamp, self-taught
L2 ๐ŸŽ“ Bachelor's Degree Any undergraduate degree
L3 ๐ŸŽ“ Master's Degree Graduate-level education
L4 ๐ŸŽ“ Post Grad Professional or doctoral degree (PhD, MD, JD)

๐Ÿ› ๏ธ Tech Stack

Layer Tech Role
UI Streamlit Interactive two-page web app
ML Scikit-Learn Model training, GridSearchCV, LabelEncoder
Data Pandas DataFrame ops, cleaning, aggregation
Viz Matplotlib Pie charts, bar charts, line charts
Math NumPy Array transforms & prediction input
Persist Pickle Serialize model + encoders to .pkl
Notebook Jupyter ML experimentation & EDA

๐Ÿ“Š Dataset at a Glance

๐Ÿ“ˆ Click to expand โ€” Full data stats & cleaning steps
Metric Value
๐Ÿ“ฅ Raw responses ~65,000
๐Ÿงน After cleaning ~30,000
๐Ÿ’ฐ Salary range $10,000 โ€“ $250,000 (USD)
๐ŸŒ Countries 14 (400+ response threshold)
๐ŸŽ“ Education tiers 4 (consolidated from 15+)
โณ Experience range 0 โ€“ 50 years
๐Ÿข Employment filter Full-time only
๐Ÿ“† Survey year 2023
๐Ÿ“ฆ Source Stack Overflow Annual Developer Survey

Cleaning pipeline:

  1. โœ… Selected 5 key columns: Country, EdLevel, YearsCodePro, Employment, Salary
  2. โœ… Dropped rows with null salary โ†’ kept 34,756 rows
  3. โœ… Removed remaining nulls โ†’ 34,000+ clean rows
  4. โœ… Filtered for full-time employment only โ†’ 30,019 rows
  5. โœ… Grouped countries with < 400 responses into "Other", then removed "Other"
  6. โœ… Capped salaries to $10Kโ€“$250K to remove extreme outliers
  7. โœ… Mapped experience strings ("Less than 1 year", "More than 50 years") to floats
  8. โœ… Consolidated 15+ education categories into 4 clean tiers

โšก Quick Start

โš ๏ธ Note: This is a showcase repository โ€” the full source code is private. Interested in collaborating? Reach out โ†’

# Clone (requires private access)
git clone https://github.com/shanskarBansal/SalaryLens.git
cd SalaryLens

# Install dependencies
pip install streamlit pandas scikit-learn matplotlib numpy

# Place the Stack Overflow Developer Survey 2023 CSV in project root
# Download from: https://survey.stackoverflow.co/

# Launch the app
streamlit run app.py

Requirements:

python     >= 3.8
streamlit  >= 1.28
pandas     >= 1.5
scikit-learn >= 1.2
matplotlib >= 3.6
numpy      >= 1.23

๐Ÿ—บ๏ธ Roadmap

Status Feature Description
๐ŸŸข Salary Prediction Core ML prediction engine
๐ŸŸข Data Exploration Interactive charts & visualizations
๐ŸŸข Multi-country Support 14 countries covered
๐ŸŸก Job Role Filtering Predict by role: Frontend, Backend, DevOps, etc.
๐ŸŸก Company Size Factor Startup vs Enterprise salary adjustments
๐Ÿ”ด REST API Expose predictions as an API endpoint
๐Ÿ”ด XGBoost / LightGBM Upgrade to gradient-boosted models
๐Ÿ”ด Auto-updating Data Live integration with latest SO surveys
๐Ÿ”ด Mobile-First UI Responsive design for all devices

๐ŸŸข Done ย ย  ๐ŸŸก Planned ย ย  ๐Ÿ”ด Future


๐Ÿ‘ฅ The Team

Developer
๐Ÿง‘โ€๐Ÿ’ป Harsh Bir
๐Ÿง‘โ€๐Ÿ’ป Priyanshu Dayal
๐Ÿง‘โ€๐Ÿ’ป Shanskar Bansal
๐Ÿง‘โ€๐Ÿ’ป Saloni Thakur

๐Ÿ”’ Source Code & Access

The complete source code for SalaryLens is maintained in a private repository. This showcase repo demonstrates the platform's capabilities, architecture, methodology, and results.

๐Ÿ“ฌ Want access or interested in collaborating? Reach out via GitHub โ†’ @shanskarBansal


๐Ÿ™ Acknowledgments

๐Ÿ“Š Stack Overflow โ€” Developer Survey 2023 dataset
๐ŸŽˆ Streamlit โ€” Python web framework
๐Ÿค– Scikit-Learn โ€” ML algorithms & tools
๐Ÿ Python โ€” Language & ecosystem


If this project helped you understand your worth โ€” โญ star it!


GitHub


Made with โค๏ธ by Team SalaryLens


Releases

No releases published

Packages

 
 
 

Contributors