A comprehensive, AI-powered web application that provides accurate salary predictions for data science professionals using advanced machine learning models and interactive data visualization.
- Interactive Dashboard: Comprehensive salary analysis across multiple dimensions
- Advanced Filtering: Real-time filters for experience level, job titles, locations, and more
- Beautiful Visualizations: Professional charts and graphs using Plotly
- Key Metrics: Statistical insights with formatted displays
- Multiple ML Models: Random Forest, XGBoost, Gradient Boosting, and Linear Regression
- High Accuracy: Advanced algorithms trained on real salary data
- Feature Engineering: Comprehensive preprocessing and encoding
- Model Comparison: Performance metrics for all models
- Feature Importance: SHAP-based explanations for predictions
- Confidence Intervals: Uncertainty quantification for predictions
- Market Comparison: Compare predictions with similar profiles
- Responsive Design: Works on desktop, tablet, and mobile
- Dark/Light Theme: Automatic theme detection and switching
- Professional Styling: Modern CSS with smooth animations
- Intuitive Navigation: Tab-based interface for easy exploration
- Python 3.8 or higher
- pip package manager
-
Clone or download the project
cd "EDA on Data Science Salaries"
-
Install dependencies
pip install -r requirements.txt
-
Run the application
python -m streamlit run app.py
-
Open your browser Navigate to
http://localhost:8501
- streamlit: Web application framework
- pandas: Data manipulation and analysis
- plotly: Interactive visualizations
- scikit-learn: Machine learning algorithms
- xgboost: Gradient boosting framework
- shap: Model explainability
- matplotlib & seaborn: Additional plotting libraries
- numpy: Numerical computing
- joblib: Model serialization
- Data Cleaning: Outlier removal and data validation
- Feature Engineering: Categorical encoding and scaling
- Data Caching: Streamlit caching for optimal performance
- Data Preparation: Feature selection and preprocessing
- Model Training: Multiple algorithms with cross-validation
- Model Evaluation: Comprehensive metrics (MAE, RMSE, R²)
- Prediction: Real-time salary estimation
- Explanation: Feature importance and SHAP values
- Random Forest Regressor: Ensemble method with feature importance
- XGBoost Regressor: Gradient boosting with high performance
- Gradient Boosting Regressor: Sequential ensemble learning
- Linear Regression: Baseline model with feature scaling
- Key salary statistics and metrics
- Dataset information and filtering status
- Quick insights and trends
- Salary distribution analysis
- Statistical breakdowns by various factors
- Trend analysis over time
- Salary comparison across different job titles
- Role-specific insights and trends
- Career progression analysis
- Global salary distribution maps
- Country-wise compensation analysis
- Remote work impact on salaries
- Salary progression by experience level
- Experience distribution analysis
- Career growth insights
- Input Form: Easy-to-use feature input interface
- Model Selection: Choose from multiple ML algorithms
- Real-time Predictions: Instant salary estimates
- Confidence Intervals: Prediction uncertainty ranges
- Feature Importance: Understand what drives salary predictions
- Market Comparison: Compare with similar profiles in dataset
- Career Tips: Actionable insights for salary optimization
The salary predictor considers the following factors:
- Work Year: Current year for market conditions
- Experience Level: Entry, Mid, Senior, or Executive
- Employment Type: Full-time, Part-time, Contract, or Freelance
- Job Title: Specific role (Data Scientist, ML Engineer, etc.)
- Company Location: Geographic location of the company
- Company Size: Small, Medium, or Large organization
- Remote Ratio: On-site, Hybrid, or Fully Remote work
Our models achieve high accuracy with the following typical performance:
- R² Score: 0.85+ (explains 85%+ of salary variance)
- Mean Absolute Error: $15,000-25,000 USD
- Root Mean Square Error: $20,000-35,000 USD
- Use the predictor to estimate fair salary ranges
- Compare your profile with market standards
- Identify factors that could increase your salary
- Explore different locations and company sizes
- Benchmark salary offerings against market rates
- Understand compensation factors in your industry
- Plan competitive salary packages
- Analyze geographic and remote work impacts
- Explore comprehensive salary trends
- Analyze factors affecting data science compensation
- Study geographic and temporal patterns
- Understand the impact of experience and skills
- Skills-based Prediction: Include specific technical skills
- Industry Analysis: Sector-specific salary insights
- Real-time Data: Integration with live job market data
- Advanced ML: Deep learning models for better accuracy
- API Integration: RESTful API for external applications
- Export Features: PDF reports and data downloads
We welcome contributions! Here's how you can help:
- Data Enhancement: Add more recent salary data
- Model Improvement: Implement new ML algorithms
- Feature Addition: Add new analysis dimensions
- UI/UX Enhancement: Improve user interface
- Bug Fixes: Report and fix issues
This project is open source and available under the MIT License.
- Data science community for salary transparency
- Streamlit team for the amazing framework
- Scikit-learn and XGBoost developers
- Plotly team for interactive visualizations
If you encounter any issues or have questions:
- Check the terminal output for error messages
- Ensure all dependencies are properly installed
- Verify that the CSV data file is in the correct location
- Try refreshing the browser if the app seems unresponsive
Built using Streamlit, Scikit-learn, and modern web technologies
Last Updated: November 2025