A Python web scraping project designed to extract product data from various e-commerce websites specializing in leather goods and handicrafts.
Web-Scrabing/
├── README.md # This file
├── requirements.txt # Python dependencies
│
├── 🐍 SCRAPERS (Main Scripts)
│ ├── curomania.py # Scrapes cuiromania.com leather products
│ ├── simple.py # General-purpose web scraper
│ ├── MinAjiliki.py # Scrapes MinAjiliki handicrafts
│ └── cleaned.py # Data cleaning utility
│
├── 📊 DATA & OUTPUTS
│ ├── data/ # Directory for organized data files
│ ├── EXCEL/ # Excel export folder
│ ├── cleaned_data.csv # Cleaned and processed data
│ ├── cuiromania_products.csv # Raw cuiromania.com products
│ ├── maleatherdesign_products.csv # Leather design products (empty)
│ ├── maleatherdesign_products_100.csv # Leather design products (100 items)
│ ├── leather_goods_cuiromania_*.csv # Dated extracts
│ ├── leather_goods_morocco_*.csv # Morocco products
│ ├── cleaned_data.csv # Processed output
│ └── scraped_product_urls.json # URLs JSON export
│
├── 💻 WEB CONTENT
│ └── product_page.html # Sample/cached product page HTML
│
├── ⚙️ UTILITIES & CONFIG
│ ├── utils/ # Utility functions directory
│ ├── scrapers/ # Scrapers modules directory
│ ├── blog # Blog/documentation file
│ └── .vscode/ # VS Code settings
This project scrapes product data from multiple Moroccan and international leather goods/handicrafts websites:
- Cuiromania (
curomania.py) - Premium leather goods - MaLeatherDesign - Leather product catalog
- Other sources - Via
simple.pyandMinAjiliki.py
Data extracted:
- Product name & description
- Prices & availability
- Product URLs
- Images/media
- Categories & references
- Python 3.x
- Dependencies listed in
requirements.txt
# Clone the repository
git clone https://github.com/DohaSK/Web-Scrabing.git
cd Web-Scrabing
# Install dependencies
pip install -r requirements.txt# Scrape Cuiromania products
python curomania.py
# Run general scraper
python simple.py
# Clean data
python cleaned.py| File | Source | Purpose |
|---|---|---|
cuiromania_products.csv |
cuiromania.com | Raw product listings |
maleatherdesign_products_100.csv |
MaLeatherDesign | 100 products sample |
leather_goods_*.csv |
Various | Dated extractions |
| File | Purpose |
|---|---|
cleaned_data.csv |
Deduplicated & cleaned dataset |
scraped_product_urls.json |
All product URLs in JSON format |
| File | Purpose |
|---|---|
product_page.html |
Sample HTML page for testing/reference |
See requirements.txt for the complete list. Common dependencies:
- requests - HTTP requests
- BeautifulSoup4 - HTML parsing
- Selenium (optional) - JavaScript rendering
- pandas - Data processing
1. Run Scraper (curomania.py / simple.py)
↓
2. Extract Product Data (name, price, URL, images)
↓
3. Save to CSV/JSON
↓
4. Clean & Deduplicate (cleaned.py)
↓
5. Export to Excel (data/EXCEL/)
- Data Updates: Each scraper run overwrites or appends data with timestamps
- CSV Files with Dates:
leather_goods_*_[timestamp].csv- These are dated backups - Empty Files: Some CSVs may be empty if scraping failed or no data was found
- Rate Limiting: Be respectful to target websites; add delays between requests
- Check website URLs in scripts (websites may change structure)
- Verify internet connection
- Check for JavaScript-rendered content (may need Selenium)
- Review
cleaned.pyfor filtering logic - Check input CSV format matches expected structure
product_page.htmlis a cached page - safe to delete if needed
To improve organization:
- Move all scrapers →
scrapers/folder - Move utility functions →
utils/folder - Organize raw data →
data/raw/ - Organize processed data →
data/processed/ - Create
config.pyfor URLs and settings
Specify your license here (e.g., MIT, GPL, etc.)
Doha Skouf - Created April 2026
For issues or questions, create an issue on GitHub or contact the project maintainer.