Perfume Data API

A complete FastAPI application that scrapes perfume data from Fragrantica, stores it in Supabase (PostgreSQL), and serves it through a REST API. Currently being hosted here https://perfumapi-frontend.onrender.com/. For testing and educational purposes. One love Fragrantica.com <3. Made by the one and only SECCAZ.

Features

Web Scraper: Extracts perfume data from Fragrantica.com
- General popular perfumes scraping
- Brand-specific scraping (e.g., Jean Paul Gaultier, Xerjoff, Creed)
- Multi-brand batch scraping
- Direct URL scraping (fastest - single perfume)
Supabase Integration: PostgreSQL database with auto-migration
Authentication: Supabase JWT-based auth for protected endpoints
FastAPI Backend: Fast, modern REST API with automatic documentation
CORS Enabled: Works with any frontend application

Project Structure

PerfumAPI/
├── api/
│   └── main.py              # FastAPI application with all endpoints
├── scraper/
│   └── scrape.py            # Fragrantica web scraper
├── utils/
│   ├── db.py                # Supabase client and database operations
│   └── auth.py              # Authentication middleware
├── data/
│   └── data.json            # Scraped data cache (auto-generated)
├── requirements.txt         # Python dependencies
├── Procfile                 # Render deployment configuration
├── env.example              # Environment variables template
├── .gitignore              # Git ignore rules
└── README.md               # This file

Database Schema

The perfumes table includes:

id (UUID) - Primary key
name (TEXT) - Perfume name
brand (TEXT) - Brand/designer name
release_year (INTEGER) - Year of release
gender (TEXT) - Target gender (Men/Women/Unisex)
notes_top (TEXT[]) - Top notes array
notes_middle (TEXT[]) - Middle/heart notes array
notes_base (TEXT[]) - Base notes array
rating (REAL) - Average rating
votes (INTEGER) - Number of votes
description (TEXT) - Perfume description
longevity (TEXT) - Longevity rating
sillage (TEXT) - Sillage/projection rating
image_url (TEXT) - Perfume image URL
perfume_url (TEXT) - Source URL (unique)
created_at (TIMESTAMP) - Creation timestamp

Quick Start (Local Development)

Prerequisites

Python 3.9 or higher
Supabase account (free tier)
Git

1. Clone the Repository

git clone <your-repo-url>
cd PerfumAPI

2. Set Up Python Environment

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

3. Set Up Supabase

Go to supabase.com and create a free account
Create a new project
Wait for the database to be provisioned
Go to Project Settings → API
Copy your:
- Project URL (SUPABASE_URL)
- anon public key (SUPABASE_KEY)
- service_role key (SUPABASE_SERVICE_KEY) - keep this secret!

4. Configure Environment Variables

# Copy the example env file
cp env.example .env

# Edit .env and add your Supabase credentials
nano .env  # or use any text editor

Your .env file should look like:

SUPABASE_URL=https://xxxxxxxxxxxxx.supabase.co
SUPABASE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
SUPABASE_SERVICE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
DEBUG=True

5. Create Database Table

The application will attempt to auto-create the table on startup. If that doesn't work, manually create it:

Go to your Supabase project
Click SQL Editor
Run this SQL:

CREATE TABLE IF NOT EXISTS perfumes (
    id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    name TEXT NOT NULL,
    brand TEXT,
    release_year INTEGER,
    gender TEXT,
    notes_top TEXT[],
    notes_middle TEXT[],
    notes_base TEXT[],
    rating REAL,
    votes INTEGER,
    description TEXT,
    longevity TEXT,
    sillage TEXT,
    image_url TEXT,
    perfume_url TEXT UNIQUE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT TIMEZONE('utc', NOW())
);

CREATE INDEX IF NOT EXISTS idx_perfumes_url ON perfumes(perfume_url);
CREATE INDEX IF NOT EXISTS idx_perfumes_brand ON perfumes(brand);

6. Run the API

# From the project root directory
uvicorn api.main:app --reload --port 9000

The API will be available at:

🌐 API: http://localhost:9000
📚 Interactive Docs: http://localhost:9000/docs
📖 ReDoc: http://localhost:9000/redoc

Authentication Setup

To use protected endpoints (scraping, creating perfumes), you need to authenticate:

Option 1: Create a User in Supabase Dashboard

Go to Authentication → Users in Supabase
Click Add User
Enter email and password
Click Create User

Option 2: Use Supabase Auth API

curl -X POST 'https://YOUR-PROJECT.supabase.co/auth/v1/signup' \
-H "apikey: YOUR-ANON-KEY" \
-H "Content-Type: application/json" \
-d '{
  "email": "user@example.com",
  "password": "your-password"
}'

Get Authentication Token

curl -X POST 'https://YOUR-PROJECT.supabase.co/auth/v1/token?grant_type=password' \
-H "apikey: YOUR-ANON-KEY" \
-H "Content-Type: application/json" \
-d '{
  "email": "user@example.com",
  "password": "your-password"
}'

Copy the access_token from the response.

API Endpoints

Public Endpoints (No Auth Required)

Get All Perfumes

GET /perfumes?limit=100&offset=0

Get Perfume by ID

GET /perfumes/{perfume_id}

Search Perfumes

GET /perfumes/search/{query}?limit=50

Get Statistics

GET /stats

Protected Endpoints (Auth Required)

Include the auth token in headers:

Authorization: Bearer YOUR_ACCESS_TOKEN

Trigger Scraping (General)

POST /scrape
Content-Type: application/json

{
  "limit": 2
}

Example with curl:

curl -X POST http://localhost:9000/scrape \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"limit": 2}'

Scrape by Brand

Scrape perfumes from a specific brand:

POST /scrape/brand
Content-Type: application/json

{
  "brand_name": "Jean Paul Gaultier",
  "limit": 10
}

Example with curl:

curl -X POST http://localhost:9000/scrape/brand \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"brand_name": "Jean Paul Gaultier", "limit": 10}'

Scrape Multiple Brands

Scrape perfumes from multiple brands at once:

POST /scrape/brands
Content-Type: application/json

{
  "brands": ["Jean Paul Gaultier", "Xerjoff", "Creed"],
  "limit_per_brand": 10
}

Example with curl:

curl -X POST http://localhost:9000/scrape/brands \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"brands": ["Jean Paul Gaultier", "Xerjoff", "Creed"], "limit_per_brand": 10}'

Scrape by URL

Scrape a specific perfume by its direct Fragrantica URL:

POST /scrape/url
Content-Type: application/json

{
  "perfume_url": "https://www.fragrantica.com/perfume/Xerjoff/White-On-White-Three-76333.html"
}

Example with curl:

curl -X POST http://localhost:9000/scrape/url \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"perfume_url": "https://www.fragrantica.com/perfume/Xerjoff/White-On-White-Three-76333.html"}'

Create Perfume Manually

POST /perfumes
Content-Type: application/json

{
  "name": "Bleu de Chanel",
  "brand": "Chanel",
  "release_year": 2010,
  "gender": "Men",
  "notes_top": ["Lemon", "Mint", "Pink Pepper"],
  "notes_middle": ["Ginger", "Jasmine", "Melon"],
  "notes_base": ["Cedar", "Sandalwood", "Amber"]
}

Testing the Scraper

Test with 2 Perfumes (Safe)

curl -X POST http://localhost:9000/scrape \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"limit": 2}'

Scrape More Perfumes

# Scrape 10 perfumes
curl -X POST http://localhost:9000/scrape \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"limit": 10}'

# Scrape 100 perfumes (will take time)
curl -X POST http://localhost:9000/scrape \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"limit": 100}'

Verify Data

# Check how many perfumes are in the database
curl http://localhost:9000/stats

# List all perfumes
curl http://localhost:9000/perfumes?limit=10

Usage Examples

JavaScript/Fetch

// Get all perfumes
const response = await fetch('https://your-api.onrender.com/perfumes?limit=50');
const data = await response.json();
console.log(data.perfumes);

// Trigger scraping (requires auth)
const scrapeResponse = await fetch('https://your-api.onrender.com/scrape', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_TOKEN',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({ limit: 5 })
});
const scrapeData = await scrapeResponse.json();

Python/Requests

import requests

# Get perfumes
response = requests.get('https://your-api.onrender.com/perfumes')
perfumes = response.json()

# Scrape with auth
headers = {'Authorization': 'Bearer YOUR_TOKEN'}
scrape_response = requests.post(
    'https://your-api.onrender.com/scrape',
    headers=headers,
    json={'limit': 5}
)
print(scrape_response.json())

Important Notes

Ethical Scraping

This scraper is for educational purposes only
Respects rate limiting (2-second delay between requests)
Scrapes publicly available data only
Does not overwhelm the target website
Always check robots.txt and terms of service

Rate Limiting

The scraper includes built-in delays to be respectful:

2 seconds between page requests
Handles errors gracefully
Stops if too many failures occur

Data Privacy

Do not scrape personal information
Respect copyright and intellectual property
Only use data for personal/educational projects

Troubleshooting

"SUPABASE_URL and SUPABASE_KEY must be set"

Ensure .env file exists in project root
Check that environment variables are set correctly
Restart the application after changing .env

"Table 'perfumes' does not exist"

Run the SQL migration manually in Supabase SQL Editor
Check database connection in Supabase dashboard

Authentication Issues

Verify your token is not expired
Ensure you're using the correct token format: Bearer <token>
Create a new user if needed

Scraping Failures

Check internet connection
Fragrantica might have changed their HTML structure
Reduce the limit and try again
Check console logs for specific errors

Development

Run Tests

# Test the scraper directly
python scraper/scrape.py

# Test API health
curl http://localhost:9000/health

View Logs

# Local development logs appear in terminal

# Render logs: Dashboard → Logs tab

Database Inspection

Use Supabase dashboard:

Go to Table Editor
Select perfumes table
View, edit, or delete records

Tech Stack

Backend: FastAPI 0.104+
Database: Supabase (PostgreSQL)
Scraper: BeautifulSoup4, Requests
Auth: Supabase JWT
Hosting: Render.com (free tier)
Language: Python 3.9+

Contributing

This is an educational project. Feel free to fork and modify for your own learning!

License

This project is for educational purposes. Please respect Fragrantica's terms of service and use responsibly.

Support

For issues:

Check the troubleshooting section above
Review Supabase and Render documentation
Check console/logs for error messages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
api		api
data		data
migrations		migrations
scraper		scraper
utils		utils
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
env.example		env.example
requirements.txt		requirements.txt
setup.sh		setup.sh

License

seccaz/PerfumAPI

Folders and files

Latest commit

History

Repository files navigation

Perfume Data API

Features

Project Structure

Database Schema

Quick Start (Local Development)

Prerequisites

1. Clone the Repository

2. Set Up Python Environment

3. Set Up Supabase

4. Configure Environment Variables

5. Create Database Table

6. Run the API

Authentication Setup

Option 1: Create a User in Supabase Dashboard

Option 2: Use Supabase Auth API

Get Authentication Token

API Endpoints

Public Endpoints (No Auth Required)

Get All Perfumes

Get Perfume by ID

Search Perfumes

Get Statistics

Protected Endpoints (Auth Required)

Trigger Scraping (General)

Scrape by Brand

Scrape Multiple Brands

Scrape by URL

Create Perfume Manually

Testing the Scraper

Test with 2 Perfumes (Safe)

Scrape More Perfumes

Verify Data

Usage Examples

JavaScript/Fetch

Python/Requests

Important Notes

Ethical Scraping

Rate Limiting

Data Privacy

Troubleshooting

"SUPABASE_URL and SUPABASE_KEY must be set"

"Table 'perfumes' does not exist"

Authentication Issues

Scraping Failures

Development

Run Tests

View Logs

Database Inspection

Tech Stack

Contributing

License

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages