GitHub - shanskarBansal/SalaryLens-Showcase: SalaryLens AI-Powered Developer Salary Intelligence Platform

SalaryLens is an intelligent salary prediction & exploration platform that uses machine learning to deliver data-driven compensation insights for software engineers worldwide. Trained on the Stack Overflow Developer Survey 2023 with 30,000+ real-world data points.

❓ The Problem

Every developer eventually asks: "Am I being paid fairly?"

The global tech job market is a maze of conflicting salary data. Developers routinely face:

Pain Point	Impact
💸 No personalized benchmarks	Accepting below-market offers
🌍 Cross-border salary fog	Poor relocation & remote work decisions
🎓 Unclear ROI on education	Over/under-investing in degrees
⏳ Experience ≠ automatic raises	No visibility into career trajectory

flowchart LR
    A["🧑‍💻 YOU"] --> B["🌍 Country\n🎓 Education\n⏳ Experience"]
    B --> C["🤖 SalaryLens\nML Engine"]
    C --> D["💰 Predicted\nSalary"]

SalaryLens turns that ??? into a precise, data-backed answer — instantly.

✨ Features at a Glance

mindmap
  root((SalaryLens))
    🔮 Predict
      14 Countries
      4 Education Levels
      0–50 Years Experience
      Instant USD Estimate
    📊 Explore
      Country Distribution Pie Chart
      Mean Salary by Country Bar Chart
      Salary vs Experience Line Chart
    🤖 ML Engine
      Decision Tree Regressor
      GridSearchCV Tuned
      Label Encoded Features
    🧹 Data Pipeline
      65K Raw Responses
      30K Cleaned Records
      Outlier Capping
      Category Consolidation

🔮 Prediction Engine

Pick from 14 countries worldwide
Choose your education tier (4 levels)
Slide to set years of experience (0–50)
Hit "Calculate Salary" → instant result in USD
Expandable "Why fill this out?" section with context

📊 Exploration Dashboard

Pie chart — developer distribution by country
Bar chart — mean salary head-to-head by nation
Line chart — how salary scales with experience
Powered by Matplotlib + Streamlit for interactivity

📸 See It in Action

🖥️ Predict Page — Get Your Estimate

Prediction Interface	Full Application View

How it works: Select your country, education level, and years of experience → click Calculate Salary → get your predicted compensation instantly.

Interface highlights:

🌐 Dropdown with 14 supported countries
🎓 Education level from "Less than a Bachelors" to "Post grad"
📅 Smooth slider for experience (0–50 years)
ℹ️ Collapsible explainer: "Knowing the market rate for your skills can help you negotiate better salaries"

🌍 Explore — Where Are Developers?

Distribution highlights from the Stack Overflow 2023 Survey:

🇺🇸 United States  ██████████████████████████████████████  36.9%
🇬🇧 United Kingdom ███████████                            10.8%
🇮🇳 India          ██████████                              9.7%
🇩🇪 Germany        ███████                                 6.5%
🇫🇷 France         ██████                                  6.0%
🇨🇦 Canada         █████                                   5.0%
🇧🇷 Brazil         ████                                    4.1%
🇳🇱 Netherlands    ███                                     3.4%
🇪🇸 Spain          ███                                     3.3%
🇵🇱 Poland         ███                                     3.2%
🇦🇺 Australia      ███                                     2.9%
🇮🇹 Italy          ███                                     2.8%
🇸🇪 Sweden         ██                                      2.6%
🇷🇺 Russia         ██                                      2.6%

💡 Insight: The US alone accounts for more than a third of all respondents, which gives the model very strong signal for American salary predictions.

💰 Explore — Salary by Country

Tier	Countries	Avg. Salary Range
🥇 Top	United States	$120,000+
🥈 High	Australia, Canada, Germany	$65K – $77K
🥉 Mid	UK, Netherlands, Sweden, France	$50K – $70K
🏅 Emerging	India, Brazil, Poland, Russia	$28K – $40K

💡 Insight: A developer in the US earns ~4x what the same profile earns in India or Brazil — geography remains the single biggest salary factor.

📈 Explore — Salary by Experience

Career stages decoded:

Stage	Years	Typical Salary	Growth Rate
🌱 Entry	0–3	$55K – $70K	🚀 Fastest growth
📈 Growth	4–10	$75K – $95K	⚡ Steep upward curve
🏔️ Plateau	11–35	$95K – $120K	📊 Steady, gradual
👑 Senior	35–50	$120K – $180K	🔝 Peaks with volatility

💡 Insight: The steepest salary jump happens in the first 10 years. After that, specialization, leadership, and geography matter more than raw experience.

🧠 Under the Hood

Data → Model → Prediction

flowchart LR
    subgraph DATA ["📥 Data Layer"]
        A[Stack Overflow\nSurvey 2023\n~65K responses]
    end
    subgraph PROCESS ["🧹 Processing"]
        B[Filter Full-Time\nEmployees]
        C[Handle Nulls\n& Outliers]
        D[Encode Categories\nLabelEncoder]
    end
    subgraph ML ["🤖 ML Layer"]
        E[Train 3 Models\nLinReg / DTree / RF]
        F[GridSearchCV\nHyperparameter Tuning]
        G[Best Model\nDecision Tree]
    end
    subgraph APP ["🌐 App Layer"]
        H[Streamlit\nWeb Interface]
        I[💰 Prediction]
        J[📊 Exploration]
    end
    A --> B --> C --> D --> E --> F --> G --> H
    H --> I
    H --> J

Pipeline Breakdown

#	Stage	What Happens	Key Details
1️⃣	Ingest	Load raw CSV from Stack Overflow	65,000 survey responses
2️⃣	Filter	Keep only full-time employed devs	Drops part-time, freelance, unemployed
3️⃣	Clean	Remove nulls, cap salary outliers	Range: $10K – $250K
4️⃣	Consolidate	Group rare countries into "Other"	Threshold: 400+ responses to keep
5️⃣	Engineer	Standardize education into 4 tiers	Map 15+ raw categories → 4
6️⃣	Encode	LabelEncoder for country & education	Numeric representation for ML
7️⃣	Train	Fit 3 regression algorithms	Compare RMSE across models
8️⃣	Tune	GridSearchCV on Decision Tree	Optimize `max_depth` parameter
9️⃣	Serialize	Pickle model + encoders	`saved_steps.pkl` for production
🔟	Deploy	Serve via Streamlit	Two-page app: Predict + Explore

🏗️ Project Architecture

SalaryLens/
│
├── 🐍 app.py                    ← Entry point — sidebar navigation
│   ├── 🔮 predict_page.py       ← Prediction UI + model inference
│   └── 📊 explore_page.py       ← Data viz: pie, bar, line charts
│
├── 📓 SalaryPrediction.ipynb    ← Full ML experimentation notebook
├── 📦 saved_steps.pkl           ← Serialized model + label encoders
└── 📁 survey_results_public.csv ← Stack Overflow raw dataset

graph TD
    A["🐍 app.py"] -->|Sidebar: Predict| B["🔮 predict_page.py"]
    A -->|Sidebar: Explore| C["📊 explore_page.py"]
    B --> D["📦 saved_steps.pkl"]
    D --> E["DecisionTreeRegressor"]
    D --> F["LabelEncoder × 2"]
    E --> G["💰 Predicted Salary"]
    C --> H["📁 survey_results_public.csv"]
    H --> I["🥧 Pie: Country Distribution"]
    H --> J["📊 Bar: Salary by Country"]
    H --> K["📈 Line: Salary by Experience"]

    style A fill:#58a6ff,stroke:#1f6feb,color:#0d1117
    style G fill:#3fb950,stroke:#238636,color:#0d1117
    style I fill:#d2a8ff,stroke:#8b5cf6,color:#0d1117
    style J fill:#d2a8ff,stroke:#8b5cf6,color:#0d1117
    style K fill:#d2a8ff,stroke:#8b5cf6,color:#0d1117

🤖 Model Showdown

Three algorithms went head-to-head on the same data:

Model	RMSE (USD)	Verdict
📐 Linear Regression	~$30,500	⚪ Solid baseline, underfits complex patterns
🌳 Decision Tree	$30,428	🟢 Winner — best bias-variance tradeoff after tuning
🌲 Random Forest	$29,487	🟡 Lowest training error, but overfitting risk

🏆 Decision Tree Regressor was selected — tuned via GridSearchCV with max_depth ∈ {None, 2, 4, 6, 8, 10, 12} using neg_mean_squared_error scoring.

🔧 Why not Random Forest?

While Random Forest achieved a lower RMSE on the training set ($29,487 vs $30,428), the Decision Tree with optimized depth showed better generalization to unseen data. The small RMSE gap (~$1K) didn't justify the added complexity and overfitting risk of the ensemble approach for this feature set of only 3 input variables.

🌍 Coverage Map


🇺🇸 United States	🇮🇳 India	🇬🇧 United Kingdom	🇩🇪 Germany	🇨🇦 Canada
🇧🇷 Brazil	🇫🇷 France	🇪🇸 Spain	🇦🇺 Australia	🇳🇱 Netherlands
🇵🇱 Poland	🇮🇹 Italy	🇷🇺 Russia	🇸🇪 Sweden

Education tiers supported:

Code	Level	Includes
`L1`	📚 Less than a Bachelors	High school, associate degree, bootcamp, self-taught
`L2`	🎓 Bachelor's Degree	Any undergraduate degree
`L3`	🎓 Master's Degree	Graduate-level education
`L4`	🎓 Post Grad	Professional or doctoral degree (PhD, MD, JD)

🛠️ Tech Stack

Layer	Tech	Role
UI		Interactive two-page web app
ML		Model training, GridSearchCV, LabelEncoder
Data		DataFrame ops, cleaning, aggregation
Viz		Pie charts, bar charts, line charts
Math		Array transforms & prediction input
Persist		Serialize model + encoders to `.pkl`
Notebook		ML experimentation & EDA

📊 Dataset at a Glance

📈 Click to expand — Full data stats & cleaning steps

Metric	Value
📥 Raw responses	~65,000
🧹 After cleaning	~30,000
💰 Salary range	$10,000 – $250,000 (USD)
🌍 Countries	14 (400+ response threshold)
🎓 Education tiers	4 (consolidated from 15+)
⏳ Experience range	0 – 50 years
🏢 Employment filter	Full-time only
📆 Survey year	2023
📦 Source	Stack Overflow Annual Developer Survey

Cleaning pipeline:

✅ Selected 5 key columns: Country, EdLevel, YearsCodePro, Employment, Salary
✅ Dropped rows with null salary → kept 34,756 rows
✅ Removed remaining nulls → 34,000+ clean rows
✅ Filtered for full-time employment only → 30,019 rows
✅ Grouped countries with < 400 responses into "Other", then removed "Other"
✅ Capped salaries to $10K–$250K to remove extreme outliers
✅ Mapped experience strings ("Less than 1 year", "More than 50 years") to floats
✅ Consolidated 15+ education categories into 4 clean tiers

⚡ Quick Start

⚠️ Note: This is a showcase repository — the full source code is private. Interested in collaborating? Reach out →

# Clone (requires private access)
git clone https://github.com/shanskarBansal/SalaryLens.git
cd SalaryLens

# Install dependencies
pip install streamlit pandas scikit-learn matplotlib numpy

# Place the Stack Overflow Developer Survey 2023 CSV in project root
# Download from: https://survey.stackoverflow.co/

# Launch the app
streamlit run app.py

Requirements:

python     >= 3.8
streamlit  >= 1.28
pandas     >= 1.5
scikit-learn >= 1.2
matplotlib >= 3.6
numpy      >= 1.23

🗺️ Roadmap

Status	Feature	Description
🟢	Salary Prediction	Core ML prediction engine
🟢	Data Exploration	Interactive charts & visualizations
🟢	Multi-country Support	14 countries covered
🟡	Job Role Filtering	Predict by role: Frontend, Backend, DevOps, etc.
🟡	Company Size Factor	Startup vs Enterprise salary adjustments
🔴	REST API	Expose predictions as an API endpoint
🔴	XGBoost / LightGBM	Upgrade to gradient-boosted models
🔴	Auto-updating Data	Live integration with latest SO surveys
🔴	Mobile-First UI	Responsive design for all devices

_{🟢 Done 🟡 Planned 🔴 Future}

👥 The Team

	Developer
🧑‍💻	Harsh Bir
🧑‍💻	Priyanshu Dayal
🧑‍💻	Shanskar Bansal
🧑‍💻	Saloni Thakur

🔒 Source Code & Access

The complete source code for SalaryLens is maintained in a private repository. This showcase repo demonstrates the platform's capabilities, architecture, methodology, and results.

📬 Want access or interested in collaborating? Reach out via GitHub → @shanskarBansal

🙏 Acknowledgments


📊	Stack Overflow — Developer Survey 2023 dataset
🎈	Streamlit — Python web framework
🤖	Scikit-Learn — ML algorithms & tools
🐍	Python — Language & ecosystem

If this project helped you understand your worth — ⭐ star it!

Made with ❤️ by Team SalaryLens

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
image-1.png		image-1.png
image-2.png		image-2.png
image-3.png		image-3.png
image-4.png		image-4.png
image.png		image.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

❓ The Problem

✨ Features at a Glance

🔮 Prediction Engine

📊 Exploration Dashboard

📸 See It in Action

🧠 Under the Hood

Data → Model → Prediction

Pipeline Breakdown

🏗️ Project Architecture

🤖 Model Showdown

🌍 Coverage Map

🛠️ Tech Stack

📊 Dataset at a Glance

⚡ Quick Start

🗺️ Roadmap

👥 The Team

🔒 Source Code & Access

🙏 Acknowledgments

If this project helped you understand your worth — ⭐ star it!

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

❓ The Problem

✨ Features at a Glance

🔮 Prediction Engine

📊 Exploration Dashboard

📸 See It in Action

🧠 Under the Hood

Data → Model → Prediction

Pipeline Breakdown

🏗️ Project Architecture

🤖 Model Showdown

🌍 Coverage Map

🛠️ Tech Stack

📊 Dataset at a Glance

⚡ Quick Start

🗺️ Roadmap

👥 The Team

🔒 Source Code & Access

🙏 Acknowledgments

If this project helped you understand your worth — ⭐ star it!

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages