Skip to content

bbey-ummerata/Drouot-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Drouot Scraper

This scraper pulls structured, high-quality auction data from Drouot.com, one of Europe’s most active art marketplaces. It captures catalog details, bidding information, artwork metadata, and seller info directly from listing or search URLs—ideal for analysts, collectors, dealers, and art-market intelligence platforms.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for Drouot Scraper you've just found your team — Let's Chat. 👆👆

Introduction

The project automates the extraction of detailed auction listings from Drouot, transforming scattered catalog pages into clean, structured JSON.
It solves the difficulty of gathering consistent pricing ranges, lot descriptions, and auction statuses manually and supports anyone studying trends or building valuation tools.

What You Get From It

  • Complete artwork and lot metadata
  • Pricing ranges, bidding activity, and auction status
  • Images and catalog references
  • Seller and contact information
  • Scalable crawling for long catalog lists

Features

Feature Description
Artwork Metadata Extraction Retrieves lot names, descriptions, categories, edition notes, signatures, provenance, and more.
Auction Data Capture Extracts estimate ranges, bidding levels, reserve price info, and auction timing.
Image Collection Saves all artwork images including the main catalog image.
Seller Information Pulls seller or auction house contact details.
Search-URL Crawling Works from any listing or search page to fetch multiple lots at once.
Structured JSON Clean output ready for databases, pricing models, or dashboards.

What Data This Scraper Extracts

Field Name Field Description
lot_number SKU / lot identifier for the artwork.
title Artwork or object name.
description Full catalog description including technique, condition, and notes.
category Classification (e.g., Painting, Sculpture, Photography).
images Array of all extracted image URLs.
main_image Primary catalog image.
estimate_low Lower price estimate.
estimate_high Higher price estimate.
current_bid Current highest bid if available.
next_bid Required next bid amount.
reserve_met Indicates if the reserve price was met.
auction_type Online or live auction.
auction_status Status such as ongoing, closed, or upcoming.
start_time Auction start timestamp.
end_time Auction end timestamp.
seller_name Auction house or seller.
seller_contact Contact details extracted from the catalog page.

Example Output

[
  {
    "lot_number": "153",
    "title": "Bernard Buffet — Nature morte au vase",
    "description": "Oil on canvas, signed and dated 1963. Good condition. Provenance noted.",
    "category": "Peinture",
    "images": [
      "https://example.com/img1.jpg",
      "https://example.com/img2.jpg"
    ],
    "main_image": "https://example.com/img1.jpg",
    "estimate_low": 12000,
    "estimate_high": 18000,
    "current_bid": 14500,
    "next_bid": 15000,
    "reserve_met": true,
    "auction_type": "online",
    "auction_status": "in progress",
    "start_time": "2024-05-12T09:00:00Z",
    "end_time": "2024-05-19T18:00:00Z",
    "seller_name": "Maison de Ventes Dupont",
    "seller_contact": "+33 1 44 22 00 00"
  }
]

Directory Structure Tree

drouot-scraper/
├── src/
│   ├── main.js
│   ├── crawler/
│   │   ├── playwright_engine.js
│   │   ├── pagination.js
│   │   └── extractors.js
│   ├── utils/
│   │   ├── logger.js
│   │   ├── formatting.js
│   │   └── validator.js
│   └── config/
│       └── input_schema.json
├── data/
│   ├── sample_input.json
│   └── sample_output.json
├── Dockerfile
├── package.json
└── README.md

Use Cases

  • Art market analysts track pricing trends, estimate accuracy, and bidding behavior.
  • Auction houses compare their catalogs with competitors and monitor artist popularity.
  • Dealers and collectors evaluate artworks, provenance, and historical bidding patterns.
  • Data platforms enrich valuation tools and price databases with structured auction insights.
  • Researchers study trends across categories, artists, or sale cycles.

FAQs

Can it scrape entire auction catalogs?
Yes—when given a search or category URL, it crawls all available lots.

Does it retrieve high-resolution images?
It extracts all image URLs; quality depends on what Drouot provides.

What if a page has missing pricing info?
Fallback extraction rules keep the JSON structure stable even with incomplete fields.

Can this be used for real-time bidding analysis?
It captures current bids and status but is not meant for automated bidding systems.


Performance Benchmarks and Results

Primary Metric:
Efficiently crawls dozens of catalog lots per minute while preserving detailed metadata.

Reliability Metric:
Consistently handles mixed content formats and varying catalog layouts.

Efficiency Metric:
Optimized Playwright workflows reduce page-load overhead for large search crawls.

Quality Metric:
High metadata completeness with robust parsing for images, pricing, and auction states.


Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★