📚 Data Collection using Web Scraping

📖 Overview

This project demonstrates data collection through web scraping using Python. The data was collected from the Books to Scrape practice website and processed using Requests and BeautifulSoup.

The project covers the complete workflow of:

Sending HTTP requests
Downloading web pages
Parsing HTML content
Extracting structured information
Saving data into CSV format

🎯 Objectives

Learn web scraping fundamentals
Understand HTML page structure
Extract book information automatically
Store collected data in CSV format
Build a reusable scraping workflow

🛠️ Technologies Used

Python
Requests
BeautifulSoup4
Jupyter Notebook
CSV

📂 Repository Structure

Data_Collection/
│
├── Web_Scrape/
│   ├── requests.ipynb
│   ├── Beautifulsoup.ipynb
│   ├── page1.html
│   ├── page2.html
│   ├── page3.html
│   ├── page4.html
│   ├── page5.html
│   └── HTML-Books.csv
│
└── README.md

🚀 Workflow

Step 1: Fetch Web Pages

Use the Requests library to download HTML pages.

Step 2: Parse HTML

Use BeautifulSoup to parse page content and locate required elements.

Step 3: Extract Data

Collect information such as:

Book Title
Price
Availability
Rating

Step 4: Store Data

Save extracted information into CSV format for further analysis.

📊 Output

The extracted data is stored in:

HTML-Books.csv

This dataset can be used for:

Data Analysis
Machine Learning Practice
Data Cleaning Exercises
Visualization Projects

💡 Learning Outcomes

After completing this project, you will understand:

HTTP Requests
HTML Parsing
CSS Selectors
Data Extraction
CSV Handling
Basic Data Collection Pipeline

👩‍💻 Author

Manisha Kumari

GitHub: https://github.com/Manisha7530

Aspiring AI/ML Engineer | Open Source Contributor | Python Enthusiast

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚 Data Collection using Web Scraping

📖 Overview

🎯 Objectives

🛠️ Technologies Used

📂 Repository Structure

🚀 Workflow

Step 1: Fetch Web Pages

Step 2: Parse HTML

Step 3: Extract Data

Step 4: Store Data

📊 Output

💡 Learning Outcomes

👩‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Beautifulsoup.ipynb		Beautifulsoup.ipynb
HTML-Books.csv		HTML-Books.csv
README.md		README.md
page1.html		page1.html
page2.html		page2.html
page3.html		page3.html
page4.html		page4.html
page5.html		page5.html
requests.ipynb		requests.ipynb

Folders and files

Latest commit

History

Repository files navigation

📚 Data Collection using Web Scraping

📖 Overview

🎯 Objectives

🛠️ Technologies Used

📂 Repository Structure

🚀 Workflow

Step 1: Fetch Web Pages

Step 2: Parse HTML

Step 3: Extract Data

Step 4: Store Data

📊 Output

💡 Learning Outcomes

👩‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages