This project demonstrates data collection through web scraping using Python. The data was collected from the Books to Scrape practice website and processed using Requests and BeautifulSoup.
The project covers the complete workflow of:
- Sending HTTP requests
- Downloading web pages
- Parsing HTML content
- Extracting structured information
- Saving data into CSV format
- Learn web scraping fundamentals
- Understand HTML page structure
- Extract book information automatically
- Store collected data in CSV format
- Build a reusable scraping workflow
- Python
- Requests
- BeautifulSoup4
- Jupyter Notebook
- CSV
Data_Collection/
│
├── Web_Scrape/
│ ├── requests.ipynb
│ ├── Beautifulsoup.ipynb
│ ├── page1.html
│ ├── page2.html
│ ├── page3.html
│ ├── page4.html
│ ├── page5.html
│ └── HTML-Books.csv
│
└── README.md
Use the Requests library to download HTML pages.
Use BeautifulSoup to parse page content and locate required elements.
Collect information such as:
- Book Title
- Price
- Availability
- Rating
Save extracted information into CSV format for further analysis.
The extracted data is stored in:
HTML-Books.csv
This dataset can be used for:
- Data Analysis
- Machine Learning Practice
- Data Cleaning Exercises
- Visualization Projects
After completing this project, you will understand:
- HTTP Requests
- HTML Parsing
- CSS Selectors
- Data Extraction
- CSV Handling
- Basic Data Collection Pipeline
Manisha Kumari
- GitHub: https://github.com/Manisha7530
Aspiring AI/ML Engineer | Open Source Contributor | Python Enthusiast