GitHub - Shadinbm/Employee-Retention-Analysis: This project uses logistic regression to predict employee attrition based on satisfaction, workload, and other HR metrics. Built as part of a data science course using a Kaggle dataset, it includes data visualization, feature analysis, and model evaluation using scikit-learn.

# 📊 Employee Retention Analysis using Logistic Regression

This project analyzes employee attrition using **logistic regression**, a binary classification model. Created as part of a data science course, it uses an HR dataset from **Kaggle** to explore factors that influence why employees stay or leave an organization.

---

## 🔍 Objectives

- Predict whether an employee is likely to leave
- Identify key features contributing to attrition
- Visualize patterns and relationships within the data

---

## 🧾 Dataset Columns

- `satisfaction_level`
- `last_evaluation`
- `number_project`
- `average_montly_hours`
- `time_spend_company`
- `Work_accident`
- `promotion_last_5years`
- `Department`
- `salary`
- `left` *(target variable)*

---

## 🛠️ Tools & Libraries

- `pandas` for data handling
- `matplotlib` & `seaborn` for visualization
- `scikit-learn` for machine learning (Logistic Regression)

---

## 📈 Exploratory Data Analysis

Visualizations include:

- Box plots for satisfaction level vs attrition
- Count plots of departments vs left
- Correlation heatmaps

Example:

```python
import seaborn as sns
import matplotlib.pyplot as plt

sns.boxplot(x='left', y='satisfaction_level', data=df)
plt.title("Satisfaction Level vs Attrition")
plt.show()

🤖 Model Building

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Feature selection
X = df[['satisfaction_level', 'number_project', 'average_montly_hours', 'Work_accident']]
y = df['left']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model training
model = LogisticRegression()
model.fit(X_train, y_train)

🔮 Prediction

# Sample input
sample = [[0.4, 4, 160, 0]]  # satisfaction, project count, hours, accident
prediction = model.predict(sample)
print("Prediction:", "Left" if prediction[0] == 1 else "Stayed")

✅ Conclusion

This logistic regression model gives HR teams insight into which factors are most associated with employee turnover. With visual exploration and a simple predictive model, this project demonstrates the practical value of machine learning in workforce analytics.

📂 Dataset Source

HR Analytics: Job Change of Data Scientists – Kaggle

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
hr.csv		hr.csv
hrreport.ipynb		hrreport.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Model Building

🔮 Prediction

✅ Conclusion

📂 Dataset Source

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 Model Building

🔮 Prediction

✅ Conclusion

📂 Dataset Source

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages