Skip to content

amitpaul2004/Data_Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Data Science ( Analytics) & Machine Learning


Python Pandas NumPy Seaborn Matplotlib Scikit-Learn


📊 Project Overview

This repository is a comprehensive collection of Data Science and Machine Learning projects.

It demonstrates:

✨ Data Analysis
🤖 Machine Learning
📊 Statistical Testing (ANOVA)
📈 Advanced Visualization
🎬 Animated Graphs
🌍 Real-world datasets


📁 Repository Structure

📦 Project Root

  • 📂 codes → Jupyter notebooks (ML, EDA, visualization)
  • 📂 data → Real-world datasets (CSV, Excel)
  • 📜 README.md


🔥 Featured Projects

🚢 Titanic Survival Analysis

✔ ANOVA testing
✔ Survival prediction
✔ Confusion matrix + ROC curve
✔ Advanced visualization


📱 Social Media Analytics

✔ Engagement analysis
✔ Trend detection
✔ Histogram & pie charts


🏏 IPL Data Analysis

✔ Player performance
✔ Career graph
✔ Team prediction model


🎬 Netflix Series Analysis

✔ Monthly trending
✔ Genre popularity
✔ Viewer insights


🌍 Earthquake Analysis

✔ Magnitude distribution
✔ Time trend analysis
✔ Geographic insights


📈 Economic Growth Prediction

✔ Future trend modeling
✔ Dataset generation
✔ Visualization forecasting


🤖 Machine Learning Models

✔ Logistic Regression
✔ Linear Regression
✔ Classification & Prediction
✔ Model evaluation


🎨 Visualization Beyond Basics

  • Pairplots
  • Heatmaps
  • Violin plots
  • KDE plots
  • 3D Visualizations
  • Animated graphs

Dataset loaded using:

import seaborn as sns
df = sns.load_dataset("titanic")

📈 Analysis Performed

1️⃣ Data Cleaning

  • Removed missing values
  • Selected relevant columns
  • Prepared dataset for statistical testing

2️⃣ Data Visualization

The project includes multiple visualizations:

  • 📊 Bar plots
  • 📉 Histograms
  • 📦 Boxplots
  • 🎬 Animated survival visualization

Example:

sns.barplot(x="class", y="survived", hue="sex", data=df)

3️⃣ Statistical Analysis (Two-Way ANOVA) This analysis tests:

  • Effect of Gender
  • Effect of Passenger Class
  • Interaction between Gender × Class

Example model:

from statsmodels.formula.api import ols

model = ols('survived ~ C(sex) * C(Q("class"))', data=df).fit()

🎬 Animated Visualization

This project also includes animated plots using Matplotlib.

import matplotlib.animation as animation

Animated plots dynamically display survival rates across classes.


⚙️ Tech Stack

Python | Pandas | NumPy | Seaborn | Matplotlib | Plotly | Scikit-learn

⚙️ Technologies Used

  • Python
  • Pandas
  • Seaborn
  • Matplotlib
  • Statsmodels
  • Jupyter Notebook

📷 File Structure

├── 📁 .ipynb_checkpoints
│   ├── 📄 accident_predict-checkpoint.ipynb
│   ├── 📄 gender-checkpoint.ipynb
│   ├── 📄 ios_android-checkpoint.ipynb
│   ├── 📄 kolkata-checkpoint.ipynb
│   ├── 📄 mock_test-checkpoint.ipynb
│   ├── 📄 testing-checkpoint.ipynb
│   └── 📄 train-checkpoint.ipynb
├── 📁 codes
│   ├── 📁 .ipynb_checkpoints
│   │   ├── 📄 advanced.ipynb
│   │   ├── 📄 climate.ipynb
│   │   ├── 📄 earth_quake.ipynb
│   │   ├── 📄 phone-pay_razar_paypal-checkpoint.ipynb
│   │   ├── 📄 test-checkpoint.ipynb
│   │   ├── 📄 titanic-checkpoint.ipynb
│   │   ├── 📄 train-checkpoint.ipynb
│   │   └── 📄 visual.ipynb
│   ├── 📄 JIS.ipynb
│   ├── 📄 accident_predict.ipynb
│   ├── 📄 adavance_pd.ipynb
│   ├── 📄 assigment1.ipynb
│   ├── 📄 breast_cancer.ipynb
│   ├── 📄 earth_quake_2.ipynb
│   ├── 📄 ecomic_gwrth.ipynb
│   ├── 📄 finalproject.ipynb
│   ├── 📄 gender.ipynb
│   ├── 📄 ios_android.ipynb
│   ├── 📄 ipl.ipynb
│   ├── 📄 jis_university.db
│   ├── 📄 jis_university_students.xlsx
│   ├── 📄 kolkata.ipynb
│   ├── 📄 match.ipynb
│   ├── 📄 mock_test.ipynb
│   ├── 📄 netflix.ipynb
│   ├── 📄 new_titanic.ipynb
│   ├── 📄 personl_data.ipynb
│   ├── 📄 phone-pay_razar_paypal.ipynb
│   ├── 📄 socia1l.ipynb
│   ├── 📄 social.ipynb
│   ├── 📄 social_media_usage.xlsx
│   ├── 📄 social_use.ipynb
│   ├── 📄 student_fruit.ipynb
│   ├── 📄 testing.ipynb
│   ├── 📄 testing3.ipynb
│   ├── 📄 titanic.ipynb
│   ├── 📄 titanic2.ipynb
│   ├── 📄 titanic3.ipynb
│   ├── 📄 titanic4ANOVA.ipynb
│   └── 📄 train.ipynb
├── 📁 data
│   ├──  ( all data of .ipynb files )
└── 📝 README.md

🚀 How to Run the Project

Clone the repository:

git clone https://github.com/yourusername/titanic-analysis.git

Install required libraries

pip install pandas seaborn matplotlib statsmodels

Run Jupyter Notebook

jupyter notebook

👨‍💻 Author

Amit Paul

💻 Data Science | Python | Machine Learning


⭐ Support

If you like this project:

  • ⭐ Star the repository
  • 🍴 Fork the project
  • 🚀 Share it with others

About

A collection of Data Science and Machine Learning projects featuring real-world datasets, statistical analysis, and advanced visualizations. Demonstrates skills in Python, data analysis, model building, and visualization beyond basics.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors