Web Scraping Project - Leather Goods & Handicrafts

A Python web scraping project designed to extract product data from various e-commerce websites specializing in leather goods and handicrafts.

📁 Project Structure

Web-Scrabing/
├── README.md                          # This file
├── requirements.txt                   # Python dependencies
│
├── 🐍 SCRAPERS (Main Scripts)
│   ├── curomania.py                  # Scrapes cuiromania.com leather products
│   ├── simple.py                     # General-purpose web scraper
│   ├── MinAjiliki.py                 # Scrapes MinAjiliki handicrafts
│   └── cleaned.py                    # Data cleaning utility
│
├── 📊 DATA & OUTPUTS
│   ├── data/                         # Directory for organized data files
│   ├── EXCEL/                        # Excel export folder
│   ├── cleaned_data.csv              # Cleaned and processed data
│   ├── cuiromania_products.csv       # Raw cuiromania.com products
│   ├── maleatherdesign_products.csv  # Leather design products (empty)
│   ├── maleatherdesign_products_100.csv  # Leather design products (100 items)
│   ├── leather_goods_cuiromania_*.csv    # Dated extracts
│   ├── leather_goods_morocco_*.csv      # Morocco products
│   ├── cleaned_data.csv              # Processed output
│   └── scraped_product_urls.json     # URLs JSON export
│
├── 💻 WEB CONTENT
│   └── product_page.html             # Sample/cached product page HTML
│
├── ⚙️ UTILITIES & CONFIG
│   ├── utils/                        # Utility functions directory
│   ├── scrapers/                     # Scrapers modules directory
│   ├── blog                          # Blog/documentation file
│   └── .vscode/                      # VS Code settings

🎯 What This Project Does

This project scrapes product data from multiple Moroccan and international leather goods/handicrafts websites:

Cuiromania (curomania.py) - Premium leather goods
MaLeatherDesign - Leather product catalog
Other sources - Via simple.py and MinAjiliki.py

Data extracted:

Product name & description
Prices & availability
Product URLs
Images/media
Categories & references

🚀 Getting Started

Prerequisites

Python 3.x
Dependencies listed in requirements.txt

Installation

# Clone the repository
git clone https://github.com/DohaSK/Web-Scrabing.git
cd Web-Scrabing

# Install dependencies
pip install -r requirements.txt

Running Scrapers

# Scrape Cuiromania products
python curomania.py

# Run general scraper
python simple.py

# Clean data
python cleaned.py

📋 Data Files Guide

Raw Data (Input)

File	Source	Purpose
`cuiromania_products.csv`	cuiromania.com	Raw product listings
`maleatherdesign_products_100.csv`	MaLeatherDesign	100 products sample
`leather_goods_*.csv`	Various	Dated extractions

Processed Data (Output)

File	Purpose
`cleaned_data.csv`	Deduplicated & cleaned dataset
`scraped_product_urls.json`	All product URLs in JSON format

Cache Files

File	Purpose
`product_page.html`	Sample HTML page for testing/reference

📝 Requirements

See requirements.txt for the complete list. Common dependencies:

requests - HTTP requests
BeautifulSoup4 - HTML parsing
Selenium (optional) - JavaScript rendering
pandas - Data processing

🔄 Workflow

1. Run Scraper (curomania.py / simple.py)
   ↓
2. Extract Product Data (name, price, URL, images)
   ↓
3. Save to CSV/JSON
   ↓
4. Clean & Deduplicate (cleaned.py)
   ↓
5. Export to Excel (data/EXCEL/)

📌 Important Notes

Data Updates: Each scraper run overwrites or appends data with timestamps
CSV Files with Dates: leather_goods_*_[timestamp].csv - These are dated backups
Empty Files: Some CSVs may be empty if scraping failed or no data was found
Rate Limiting: Be respectful to target websites; add delays between requests

🛠️ Troubleshooting

No Data Being Scraped?

Check website URLs in scripts (websites may change structure)
Verify internet connection
Check for JavaScript-rendered content (may need Selenium)

Data Cleaning Issues?

Review cleaned.py for filtering logic
Check input CSV format matches expected structure

Large HTML Files?

product_page.html is a cached page - safe to delete if needed

📂 Recommended Next Steps

To improve organization:

Move all scrapers → scrapers/ folder
Move utility functions → utils/ folder
Organize raw data → data/raw/
Organize processed data → data/processed/
Create config.py for URLs and settings

📝 License

Specify your license here (e.g., MIT, GPL, etc.)

👤 Author

Doha Skouf - Created April 2026

📞 Support

For issues or questions, create an issue on GitHub or contact the project maintainer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping Project - Leather Goods & Handicrafts

📁 Project Structure

🎯 What This Project Does

🚀 Getting Started

Prerequisites

Installation

Running Scrapers

📋 Data Files Guide

Raw Data (Input)

Processed Data (Output)

Cache Files

📝 Requirements

🔄 Workflow

📌 Important Notes

🛠️ Troubleshooting

No Data Being Scraped?

Data Cleaning Issues?

Large HTML Files?

📂 Recommended Next Steps

📝 License

👤 Author

📞 Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
EXCEL		EXCEL
blog		blog
data		data
scrapers		scrapers
utils		utils
MinAjiliki.py		MinAjiliki.py
README.md		README.md
cleaned.py		cleaned.py
cleaned_data.csv		cleaned_data.csv
cuiromania_products.csv		cuiromania_products.csv
curomania.py		curomania.py
leather_goods_cuiromania_20250707_154558.csv		leather_goods_cuiromania_20250707_154558.csv
leather_goods_morocco_20250707_154558.csv		leather_goods_morocco_20250707_154558.csv
maleatherdesign_products.csv		maleatherdesign_products.csv
maleatherdesign_products_100.csv		maleatherdesign_products_100.csv
product_page.html		product_page.html
requirements.txt		requirements.txt
scraped_product_urls.json		scraped_product_urls.json
simple.py		simple.py

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Project - Leather Goods & Handicrafts

📁 Project Structure

🎯 What This Project Does

🚀 Getting Started

Prerequisites

Installation

Running Scrapers

📋 Data Files Guide

Raw Data (Input)

Processed Data (Output)

Cache Files

📝 Requirements

🔄 Workflow

📌 Important Notes

🛠️ Troubleshooting

No Data Being Scraped?

Data Cleaning Issues?

Large HTML Files?

📂 Recommended Next Steps

📝 License

👤 Author

📞 Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages