A complete FastAPI application that scrapes perfume data from Fragrantica, stores it in Supabase (PostgreSQL), and serves it through a REST API. Currently being hosted here https://perfumapi-frontend.onrender.com/. For testing and educational purposes. One love Fragrantica.com <3. Made by the one and only SECCAZ.
- Web Scraper: Extracts perfume data from Fragrantica.com
- General popular perfumes scraping
- Brand-specific scraping (e.g., Jean Paul Gaultier, Xerjoff, Creed)
- Multi-brand batch scraping
- Direct URL scraping (fastest - single perfume)
- Supabase Integration: PostgreSQL database with auto-migration
- Authentication: Supabase JWT-based auth for protected endpoints
- FastAPI Backend: Fast, modern REST API with automatic documentation
- CORS Enabled: Works with any frontend application
PerfumAPI/
├── api/
│ └── main.py # FastAPI application with all endpoints
├── scraper/
│ └── scrape.py # Fragrantica web scraper
├── utils/
│ ├── db.py # Supabase client and database operations
│ └── auth.py # Authentication middleware
├── data/
│ └── data.json # Scraped data cache (auto-generated)
├── requirements.txt # Python dependencies
├── Procfile # Render deployment configuration
├── env.example # Environment variables template
├── .gitignore # Git ignore rules
└── README.md # This file
The perfumes table includes:
id(UUID) - Primary keyname(TEXT) - Perfume namebrand(TEXT) - Brand/designer namerelease_year(INTEGER) - Year of releasegender(TEXT) - Target gender (Men/Women/Unisex)notes_top(TEXT[]) - Top notes arraynotes_middle(TEXT[]) - Middle/heart notes arraynotes_base(TEXT[]) - Base notes arrayrating(REAL) - Average ratingvotes(INTEGER) - Number of votesdescription(TEXT) - Perfume descriptionlongevity(TEXT) - Longevity ratingsillage(TEXT) - Sillage/projection ratingimage_url(TEXT) - Perfume image URLperfume_url(TEXT) - Source URL (unique)created_at(TIMESTAMP) - Creation timestamp
- Python 3.9 or higher
- Supabase account (free tier)
- Git
git clone <your-repo-url>
cd PerfumAPI# Create virtual environment
python -m venv venv
# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt- Go to supabase.com and create a free account
- Create a new project
- Wait for the database to be provisioned
- Go to Project Settings → API
- Copy your:
Project URL(SUPABASE_URL)anon publickey (SUPABASE_KEY)service_rolekey (SUPABASE_SERVICE_KEY) - keep this secret!
# Copy the example env file
cp env.example .env
# Edit .env and add your Supabase credentials
nano .env # or use any text editorYour .env file should look like:
SUPABASE_URL=https://xxxxxxxxxxxxx.supabase.co
SUPABASE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
SUPABASE_SERVICE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
DEBUG=TrueThe application will attempt to auto-create the table on startup. If that doesn't work, manually create it:
- Go to your Supabase project
- Click SQL Editor
- Run this SQL:
CREATE TABLE IF NOT EXISTS perfumes (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
name TEXT NOT NULL,
brand TEXT,
release_year INTEGER,
gender TEXT,
notes_top TEXT[],
notes_middle TEXT[],
notes_base TEXT[],
rating REAL,
votes INTEGER,
description TEXT,
longevity TEXT,
sillage TEXT,
image_url TEXT,
perfume_url TEXT UNIQUE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT TIMEZONE('utc', NOW())
);
CREATE INDEX IF NOT EXISTS idx_perfumes_url ON perfumes(perfume_url);
CREATE INDEX IF NOT EXISTS idx_perfumes_brand ON perfumes(brand);# From the project root directory
uvicorn api.main:app --reload --port 9000The API will be available at:
- 🌐 API: http://localhost:9000
- 📚 Interactive Docs: http://localhost:9000/docs
- 📖 ReDoc: http://localhost:9000/redoc
To use protected endpoints (scraping, creating perfumes), you need to authenticate:
- Go to Authentication → Users in Supabase
- Click Add User
- Enter email and password
- Click Create User
curl -X POST 'https://YOUR-PROJECT.supabase.co/auth/v1/signup' \
-H "apikey: YOUR-ANON-KEY" \
-H "Content-Type: application/json" \
-d '{
"email": "user@example.com",
"password": "your-password"
}'curl -X POST 'https://YOUR-PROJECT.supabase.co/auth/v1/token?grant_type=password' \
-H "apikey: YOUR-ANON-KEY" \
-H "Content-Type: application/json" \
-d '{
"email": "user@example.com",
"password": "your-password"
}'Copy the access_token from the response.
GET /perfumes?limit=100&offset=0GET /perfumes/{perfume_id}GET /perfumes/search/{query}?limit=50GET /statsInclude the auth token in headers:
Authorization: Bearer YOUR_ACCESS_TOKEN
POST /scrape
Content-Type: application/json
{
"limit": 2
}Example with curl:
curl -X POST http://localhost:9000/scrape \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"limit": 2}'Scrape perfumes from a specific brand:
POST /scrape/brand
Content-Type: application/json
{
"brand_name": "Jean Paul Gaultier",
"limit": 10
}Example with curl:
curl -X POST http://localhost:9000/scrape/brand \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"brand_name": "Jean Paul Gaultier", "limit": 10}'Scrape perfumes from multiple brands at once:
POST /scrape/brands
Content-Type: application/json
{
"brands": ["Jean Paul Gaultier", "Xerjoff", "Creed"],
"limit_per_brand": 10
}Example with curl:
curl -X POST http://localhost:9000/scrape/brands \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"brands": ["Jean Paul Gaultier", "Xerjoff", "Creed"], "limit_per_brand": 10}'Scrape a specific perfume by its direct Fragrantica URL:
POST /scrape/url
Content-Type: application/json
{
"perfume_url": "https://www.fragrantica.com/perfume/Xerjoff/White-On-White-Three-76333.html"
}Example with curl:
curl -X POST http://localhost:9000/scrape/url \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"perfume_url": "https://www.fragrantica.com/perfume/Xerjoff/White-On-White-Three-76333.html"}'POST /perfumes
Content-Type: application/json
{
"name": "Bleu de Chanel",
"brand": "Chanel",
"release_year": 2010,
"gender": "Men",
"notes_top": ["Lemon", "Mint", "Pink Pepper"],
"notes_middle": ["Ginger", "Jasmine", "Melon"],
"notes_base": ["Cedar", "Sandalwood", "Amber"]
}curl -X POST http://localhost:9000/scrape \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"limit": 2}'# Scrape 10 perfumes
curl -X POST http://localhost:9000/scrape \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"limit": 10}'
# Scrape 100 perfumes (will take time)
curl -X POST http://localhost:9000/scrape \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"limit": 100}'# Check how many perfumes are in the database
curl http://localhost:9000/stats
# List all perfumes
curl http://localhost:9000/perfumes?limit=10// Get all perfumes
const response = await fetch('https://your-api.onrender.com/perfumes?limit=50');
const data = await response.json();
console.log(data.perfumes);
// Trigger scraping (requires auth)
const scrapeResponse = await fetch('https://your-api.onrender.com/scrape', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_TOKEN',
'Content-Type': 'application/json'
},
body: JSON.stringify({ limit: 5 })
});
const scrapeData = await scrapeResponse.json();import requests
# Get perfumes
response = requests.get('https://your-api.onrender.com/perfumes')
perfumes = response.json()
# Scrape with auth
headers = {'Authorization': 'Bearer YOUR_TOKEN'}
scrape_response = requests.post(
'https://your-api.onrender.com/scrape',
headers=headers,
json={'limit': 5}
)
print(scrape_response.json())- This scraper is for educational purposes only
- Respects rate limiting (2-second delay between requests)
- Scrapes publicly available data only
- Does not overwhelm the target website
- Always check
robots.txtand terms of service
The scraper includes built-in delays to be respectful:
- 2 seconds between page requests
- Handles errors gracefully
- Stops if too many failures occur
- Do not scrape personal information
- Respect copyright and intellectual property
- Only use data for personal/educational projects
- Ensure
.envfile exists in project root - Check that environment variables are set correctly
- Restart the application after changing
.env
- Run the SQL migration manually in Supabase SQL Editor
- Check database connection in Supabase dashboard
- Verify your token is not expired
- Ensure you're using the correct token format:
Bearer <token> - Create a new user if needed
- Check internet connection
- Fragrantica might have changed their HTML structure
- Reduce the limit and try again
- Check console logs for specific errors
# Test the scraper directly
python scraper/scrape.py
# Test API health
curl http://localhost:9000/health# Local development logs appear in terminal
# Render logs: Dashboard → Logs tabUse Supabase dashboard:
- Go to Table Editor
- Select
perfumestable - View, edit, or delete records
- Backend: FastAPI 0.104+
- Database: Supabase (PostgreSQL)
- Scraper: BeautifulSoup4, Requests
- Auth: Supabase JWT
- Hosting: Render.com (free tier)
- Language: Python 3.9+
This is an educational project. Feel free to fork and modify for your own learning!
This project is for educational purposes. Please respect Fragrantica's terms of service and use responsibly.
For issues:
- Check the troubleshooting section above
- Review Supabase and Render documentation
- Check console/logs for error messages