Skip to content

Shadinbm/Employee-Retention-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation


# ๐Ÿ“Š Employee Retention Analysis using Logistic Regression

This project analyzes employee attrition using **logistic regression**, a binary classification model. Created as part of a data science course, it uses an HR dataset from **Kaggle** to explore factors that influence why employees stay or leave an organization.

---

## ๐Ÿ” Objectives

- Predict whether an employee is likely to leave
- Identify key features contributing to attrition
- Visualize patterns and relationships within the data

---

## ๐Ÿงพ Dataset Columns

- `satisfaction_level`
- `last_evaluation`
- `number_project`
- `average_montly_hours`
- `time_spend_company`
- `Work_accident`
- `promotion_last_5years`
- `Department`
- `salary`
- `left` *(target variable)*

---

## ๐Ÿ› ๏ธ Tools & Libraries

- `pandas` for data handling
- `matplotlib` & `seaborn` for visualization
- `scikit-learn` for machine learning (Logistic Regression)

---

## ๐Ÿ“ˆ Exploratory Data Analysis

Visualizations include:

- Box plots for satisfaction level vs attrition
- Count plots of departments vs left
- Correlation heatmaps

Example:

```python
import seaborn as sns
import matplotlib.pyplot as plt

sns.boxplot(x='left', y='satisfaction_level', data=df)
plt.title("Satisfaction Level vs Attrition")
plt.show()

๐Ÿค– Model Building

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Feature selection
X = df[['satisfaction_level', 'number_project', 'average_montly_hours', 'Work_accident']]
y = df['left']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model training
model = LogisticRegression()
model.fit(X_train, y_train)

๐Ÿ”ฎ Prediction

# Sample input
sample = [[0.4, 4, 160, 0]]  # satisfaction, project count, hours, accident
prediction = model.predict(sample)
print("Prediction:", "Left" if prediction[0] == 1 else "Stayed")

โœ… Conclusion

This logistic regression model gives HR teams insight into which factors are most associated with employee turnover. With visual exploration and a simple predictive model, this project demonstrates the practical value of machine learning in workforce analytics.


๐Ÿ“‚ Dataset Source

HR Analytics: Job Change of Data Scientists โ€“ Kaggle



About

This project uses logistic regression to predict employee attrition based on satisfaction, workload, and other HR metrics. Built as part of a data science course using a Kaggle dataset, it includes data visualization, feature analysis, and model evaluation using scikit-learn.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors