Skip to content

Latest commit

 

History

History
595 lines (424 loc) · 14.7 KB

File metadata and controls

595 lines (424 loc) · 14.7 KB

📊 Data Science Learning Journey

Master Data Science from Fundamentals to Machine Learning

Python NumPy Pandas Matplotlib Seaborn

Jupyter MySQL BeautifulSoup Scikit-Learn

GitHub stars GitHub forks Visitor Count

🚀 Get Started📚 Curriculum🎯 Projects💡 Skills🤝 Connect


🌟 About This Course

This repository is a comprehensive, hands-on data science curriculum designed to take you from absolute beginner to proficient data scientist. With 100+ Jupyter notebooks, real-world projects, and structured learning paths, you'll build a strong foundation in:

  • Python Programming - Master the language of data science
  • Data Analysis & Manipulation - Work with NumPy and Pandas
  • Data Visualization - Create stunning charts and insights
  • Web Scraping - Collect data from any website
  • SQL Databases - Query and manage data efficiently
  • Statistics & Probability - Build ML foundations
  • Machine Learning - Train your first ML models

📈 Status: 🟢 Active & Growing - New content added regularly!


📚 Curriculum

📖 Complete Course Modules

No. Module Topics Covered Content Skills
01 🎓 Data Science Intro Tools, Environment Setup, Career Paths, DS Lifecycle 1 PDF Guide Foundation setup
02 🐍 Python Fundamentals Variables, Data Types, Operators, Control Flow, Loops, Data Structures, OOP, Lambda 18 Notebooks Complete Python
03 🚀 Project: Social Network Recommendation Algorithms, Graph Theory, JSON Processing 3 Notebooks Real-world application
04 🔢 NumPy Mastery Arrays, Indexing, Slicing, Broadcasting, Vectorization 5 Notebooks Numerical computing
05 🐼 Pandas Deep Dive DataFrames, Series, Grouping, Merging, Time Series 2 Notebooks Data manipulation
06 📊 Data Visualization Line, Bar, Pie, Scatter, Histogram, Heatmaps, Seaborn 8 Notebooks Visual storytelling
07 🕷️ Web Scraping HTTP Requests, HTML Parsing, BeautifulSoup, Data Extraction 2 Notebooks + 49 HTML samples Web data collection
08 🗄️ SQL & Databases CRUD Operations, Joins, Subqueries, Views, Stored Procedures 20 Tutorials Database management
09 📈 Probability & Stats Conditional Probability, Bayes Theorem, Distributions 3 Tutorials + Practice Statistical thinking
10 🤖 ML Introduction How Machines Learn, ML History, Traditional vs ML PPT + Notes ML fundamentals
11 🔧 Sklearn Basics First ML Models, Training, Prediction, Model Selection 3 Notebooks Scikit-learn
12 📋 ML Algorithm Types Supervised vs Unsupervised Learning, Use Cases 3 Guides Algorithm selection
13 🎯 ML Practice Iris Classification, Model Evaluation, RMSE, MAE, Test Sets 5+ Notebooks End-to-end ML

🚀 Quick Start

Prerequisites

  • 💻 Basic computer skills
  • 🧠 Curiosity and willingness to learn
  • ⏰ 8-10 hours per week commitment
  • No prior programming experience needed!

Installation

Step 1: Clone the repository

git clone https://github.com/ggauravky/Data-Science-Learning.git
cd Data-Science-Learning

Step 2: Set up Python environment

# Option A: Using Conda (Recommended)
conda create -n datasci python=3.11 -y
conda activate datasci
conda install numpy pandas matplotlib seaborn jupyter scikit-learn -y
pip install beautifulsoup4 requests

# Option B: Using pip
pip install numpy pandas matplotlib seaborn jupyter beautifulsoup4 requests scikit-learn

Step 3: Launch Jupyter

jupyter notebook

Step 4: Start learning! 🎉

Navigate to 002 Python refresher/01_python_basic.ipynb and begin your journey!


📖 Learning Path

🎯 Recommended 12-Week Roadmap

graph LR
    A[Week 1-2: Python] --> B[Week 3-4: NumPy & Pandas]
    B --> C[Week 5-6: Visualization]
    C --> D[Week 7: Web Scraping]
    D --> E[Week 8: SQL]
    E --> F[Week 9-10: Probability]
    F --> G[Week 11-12: Machine Learning]
Loading
📅 Week-by-Week Breakdown (Click to expand)

🌱 Phase 1: Foundation (Weeks 1-4)

Week 1-2: Python Programming

  • Complete all 18 Python notebooks
  • Focus: Variables, loops, functions, OOP
  • Practice: Daily coding exercises
  • Milestone: Build a simple calculator app

Week 3: NumPy

  • Master array operations
  • Learn vectorization techniques
  • Practice: Matrix manipulations

Week 4: Pandas & First Project

  • DataFrame operations
  • Data cleaning techniques
  • Project: Coders of Delhi recommendation system

🌿 Phase 2: Intermediate (Weeks 5-8)

Week 5-6: Data Visualization

  • All chart types in Matplotlib
  • Statistical plots with Seaborn
  • Practice: Visualize real datasets

Week 7: Web Scraping

  • HTTP requests and responses
  • HTML parsing with BeautifulSoup
  • Project: Book scraper

Week 8: SQL Databases

  • CRUD operations
  • Complex joins and queries
  • Practice: Build a movie database

🌳 Phase 3: Advanced (Weeks 9-12)

Week 9-10: Statistics & SQL Advanced

  • Probability distributions
  • Bayes theorem applications
  • Stored procedures and optimization

Week 11-12: Machine Learning

  • ML fundamentals
  • First models with Scikit-learn
  • Project: Iris classification
  • Model evaluation and metrics

🎯 Projects

Featured Real-World Projects

🌐 Coders of Delhi

Social Network Recommendation System

Build algorithms similar to Facebook's "People You May Know" feature.

Tech Stack: Python, JSON, Graph Algorithms
Complexity: Intermediate
Skills: Data structures, algorithms, recommendation engines

Files:

  • data_read.ipynb
  • people_you_may_know.ipynb
  • pages_you_might_like.ipynb

📚 Book Data Scraper

Web Scraping Pipeline

Scrape 49 pages of book data from an online bookstore.

Tech Stack: Requests, BeautifulSoup, Pandas
Complexity: Beginner-Intermediate
Skills: HTTP, HTML parsing, data extraction

Output: Structured CSV with titles, prices, ratings

🌸 Iris Classification

Machine Learning Project

Train and evaluate ML models on the classic Iris dataset.

Tech Stack: Scikit-learn, NumPy, Pandas
Complexity: Intermediate
Skills: Model training, evaluation, accuracy metrics

Notebooks:

  • Quick training
  • Accuracy measurement
  • Data analysis
  • Test set creation
  • Stratified sampling

📊 Data Analysis Suite

Pandas Practice Projects

Analyze real-world datasets with advanced techniques.

Tech Stack: Pandas, Matplotlib, Seaborn
Complexity: Beginner-Intermediate
Skills: Grouping, merging, aggregation, visualization

Features:

  • Data cleaning pipelines
  • Statistical analysis
  • Trend visualization

💡 Skills You'll Gain

🐍 Programming

  • ✅ Python syntax & semantics
  • ✅ Object-oriented programming
  • ✅ Functional programming
  • ✅ List comprehensions
  • ✅ Lambda expressions
  • ✅ File I/O operations
  • ✅ JSON data handling
  • ✅ Error handling

📊 Data Science

  • ✅ NumPy array operations
  • ✅ Pandas DataFrames
  • ✅ Data cleaning & preprocessing
  • ✅ Statistical analysis
  • ✅ Data visualization
  • ✅ Exploratory data analysis
  • ✅ Feature engineering
  • ✅ Data transformation

🤖 Machine Learning

  • ✅ ML fundamentals
  • ✅ Supervised learning
  • ✅ Unsupervised learning
  • ✅ Model training
  • ✅ Model evaluation
  • ✅ Scikit-learn library
  • ✅ Algorithm selection
  • ✅ Performance metrics

🗄️ Databases

  • ✅ SQL queries (SELECT, JOIN)
  • ✅ Database design
  • ✅ CRUD operations
  • ✅ Aggregations & grouping
  • ✅ Subqueries
  • ✅ Views & indexes
  • ✅ Stored procedures
  • ✅ Query optimization

🕷️ Web Scraping

  • ✅ HTTP protocol
  • ✅ HTML structure
  • ✅ CSS selectors
  • ✅ BeautifulSoup parsing
  • ✅ Requests library
  • ✅ Data extraction
  • ✅ Ethical scraping
  • ✅ Pipeline building

📈 Statistics

  • ✅ Probability theory
  • ✅ Distributions
  • ✅ Conditional probability
  • ✅ Bayes theorem
  • ✅ Hypothesis testing
  • ✅ Statistical inference
  • ✅ Sampling techniques
  • ✅ Error metrics

🛠️ Technology Stack

Core Technologies

Category Tools
💻 Language Python 3.11+
📊 Data Analysis NumPy, Pandas
📈 Visualization Matplotlib, Seaborn
🕸️ Web Scraping Requests, BeautifulSoup4
🗄️ Database MySQL
🤖 Machine Learning Scikit-learn
📓 IDE Jupyter Notebook, VS Code

📈 Progress Tracker

Use this checklist to track your learning journey:

Core Modules

  • 🎓 Introduction to Data Science
  • 🐍 Python Fundamentals (18 notebooks)
  • 🔢 NumPy Mastery (5 notebooks)
  • 🐼 Pandas Deep Dive (2 notebooks)
  • 📊 Data Visualization (8 notebooks)
  • 🕷️ Web Scraping (2 notebooks)
  • 🗄️ SQL & Databases (20 tutorials)
  • 📈 Probability & Statistics
  • 🤖 Machine Learning Introduction
  • 🔧 Scikit-learn Basics
  • 📋 ML Algorithm Types
  • 🎯 ML Practice (5+ notebooks)

Projects

  • 🌐 Coders of Delhi - Social Network
  • 📚 Book Data Scraper
  • 🌸 Iris Classification
  • 📊 Data Analysis Projects

Milestones

  • 🎖️ Completed first 50 notebooks
  • 🏆 Built 3 portfolio projects
  • 🚀 Trained first ML model
  • ⭐ Contributed to the repo

🤝 Connect

Let's Learn Together!

LinkedIn GitHub Instagram

Questions? Suggestions? Want to collaborate?
Feel free to open an issue or reach out directly!


🤝 Contributing

We welcome contributions from the community! Here's how you can help:

Ways to Contribute

  • 🐛 Report Bugs: Found an error? Let us know!
  • 💡 Suggest Features: Have ideas for new content?
  • 📝 Improve Documentation: Help make explanations clearer
  • 🎨 Add Examples: Share your own projects and solutions
  • 🌐 Translate: Help make content accessible in other languages

How to Contribute

  1. Fork this repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

TL;DR: You can use, modify, and distribute this content freely. Attribution appreciated! 🙏


⭐ Show Your Support

If this repository helped you in your data science journey:

  • Star this repository
  • 🍴 Fork it for your own learning
  • 📢 Share with fellow learners
  • 💬 Spread the word on social media

📊 Repository Stats

GitHub contributors GitHub last commit GitHub repo size


🙏 Acknowledgments

  • 🎓 Inspired by various data science courses and bootcamps
  • 📚 Built with passion for the data science community
  • 🌟 Thanks to all contributors and learners

Made with ❤️ for Data Science Learners Worldwide

Happy Learning! 🚀

Footer