Art Datasets for Machine Learning
A curated list of publicly available art datasets for machine learning research, covering classification, object detection, visual question answering, aesthetics, generative models, sketches, and more. Includes both scientific benchmark datasets and museum open-access collections.
Contributions welcome via pull request.
Availability: ✅ freely downloadable (direct download, Zenodo, Kaggle, HuggingFace, GitHub, etc.)
📩 requires application or approval
📄 paper only, no public data link found
🔒 browse only, no bulk download
Sorted by year (newest first) within each category.
Classification, Style & Attribution
Dataset
Size
Year
Avail.
Links
Notes
Stylebreeder
6.8M images, 1.8M prompts
2024
✅
paper , huggingface
AI-generated images from Artbreeder with style clusters. NeurIPS 2024, CC0
fruit-SALAD
10,000 images
2024
✅
paper , code
Synthetic benchmark for style vs content similarity in embeddings
StyleBabel
135,000 artworks
2022
📄
paper (ECCV)
Free-form style tags and captions from art/design students on Behance
ArtBench-10
60,000 images
2022
✅
paper , code , kaggle
Class-balanced benchmark, 10 styles, clean annotations
WikiArtVectors
68,094 artworks
2022
📄
paper
Precomputed CLIP embeddings for WikiArt, 132 styles
DELAUNAY
11,503 images
2022
✅
paper , code
Abstract and non-figurative art, 53 artists
contempArt
14,398 images, 441 artists
2020
✅
paper , zenodo , github
Contemporary art from German art schools with social network data (456K edges) and demographics
DomainNet
600,000 images
2019
✅
website
345 classes across 6 domains (painting, clipart, sketch, photo, etc.)
NoisyArt
90,000+ images
2019
✅
paper , code
Webly-supervised artwork recognition with noisy web labels, 3,000+ classes
Multitask Painting Collection
~100,000 images
2019
✅
paper , data
Multitask learning: artist, style, genre, period
Best Artworks of All Time
~8,000 images
2019
✅
kaggle
50 most influential artists, popular starter dataset
BAM!
2.5M+ images
2017
📄
paper
Behance Artistic Media: content, emotion, and media labels. ICCV 2017
Art 500K
~500,000 images
2017
📄
paper , data
Large-scale art retrieval. Download link dead (404 as of 2026-04). Compiled from WikiArt + WGA + Rijks + Google Arts
Pandora 18k
18,038 images
2016
📄
paper
18 art styles, expert-labeled, higher annotation quality than WikiArt
Painter by Numbers
103,250 images
2016
✅
kaggle
Kaggle competition: predict whether two paintings are by the same artist
Rijksmuseum Challenge
112,039 images (3,593 paintings)
2014
✅
paper , data , code
Artist, material, type prediction
Painting-91
4,266 images
2014
✅
paper , data
91 painters, style classification
PRINTART
988 images
2012
✅
paper , data
Print art classification
WikiArt
~250,000 images
✅
varies
website , crawler , source 1 , source 2 , kaggle
3,000+ artists, most widely used art dataset. API signups currently disabled
WikiArt 215K (HuggingFace)
215,000 images
✅
varies
huggingface
Preprocessed WikiArt with image URLs and captions (title, artist, year, genre, style). 150K+ with parseable dates
Web Gallery of Art
~19,000 images
ongoing
✅
website
European fine art, encyclopedic scope
Object Detection, Pose & Iconography
Dataset
Size
Year
Avail.
Links
Notes
Poses of People in Art
2,454 images, 10,749 figures
2024
✅
paper , data
First openly licensed pose estimation dataset for art, 22 depiction styles
Human-Art
50,000 images, 123,000 person instances
2023
📩
paper , website
Natural and artificial scenes (paintings, cartoons, sculptures). CVPR 2023
DEArt
15,000+ images
2022
✅
paper , data
69 object classes, 12 pose categories, European paintings 12th-18th c.
ArtDL 2.0
42,479 images
2021
✅
paper , data , code
Iconographic classification, 19 Iconclass classes, Renaissance art
Materials In Paintings
19,325 paintings, 227,810 bboxes
2021
✅
paper , data
Material perception (fabric, metal, wood, etc.) with fine-grained labels
IconArt
5,955 images
2018
✅
paper , huggingface , data
Weakly supervised iconographic element detection (angels, saints, etc.)
People-Art
images from 41 art movements
2016
✅
paper , code
Cross-depiction person detection across photos, cartoons, art
Oxford VGG Paintings
210,000+ paintings (10K annotated)
2014
✅
paper , data
Object retrieval in paintings, crowdsourced object tags
Dataset
Size
Year
Avail.
Links
Notes
ArtELingo-28
~200,000 annotations, 28 languages
2024
📄
paper
Multilingual art emotion annotations. EMNLP 2024
APDDv2
10,023 images, 85,191 scores
2024
✅
paper , code
Expert aesthetic scores and language comments, 10 attributes. NeurIPS 2024
BAID
60,337 images, 360,000+ votes
2023
✅
paper , code
BoldBrush artistic image aesthetics with user votes. CVPR 2023
ArtELingo
~1.24M annotations (EN/AR/ZH/ES)
2022
✅
paper , website
Multilingual emotion annotations on WikiArt. EMNLP 2022
ArtEmis v2
260,000 contrastive instances
2022
✅
website
Contrastive extension balancing positive/negative emotion pairs
TAD66K
66,000 images, 47 themes
2022
✅
paper , huggingface
Theme-oriented aesthetics, 1,200+ annotations per image. IJCAI 2022
ArtEmis
455,000 annotations on 80,000 artworks
2021
✅
paper , website
Emotion attributions + verbal explanations for WikiArt. CVPR 2021
WikiArt Emotions
4,105 images, 20 emotion categories
2018
✅
paper , data
Crowdsourced emotions across four Western art periods
AVA
250,000+ images
2012
✅
paper , kaggle
Photography aesthetics from DPChallenge, scores + style labels
Dataset
Size
Year
Avail.
Links
Notes
WikiArt Face
6,095 face images
2021
✅
data , code
Faces cropped from portraits across art movements
AAHQ
~25,000 images
2021
✅
code
Artistic portrait faces from Artstation, various painting styles
MetFaces
1,336 face images (1024x1024)
2020
✅
code
Faces from Met artworks, aligned and cropped for GAN training
Artistic Faces Dataset
from 103,250 artworks
2019
✅
website
68 facial landmarks plus artist/style metadata
Dataset
Size
Year
Avail.
Links
Notes
CognArtive
LLM art analyses
2025
✅
paper , website
LLM-generated formal art analyses and aesthetic descriptions
MELArt
annotations over Wikimedia art
2024
✅
paper , code
Multimodal entity linking in paintings
AQUA
QA pairs over SemArt
2020
✅
paper , code
Visual and knowledge-based question answering on art
Artpedia
2,930 paintings
2019
📄
paper
28,212 text sentences (visual + contextual), cross-modal retrieval
OmniArt
2,050,017 images
2018
📄
paper , data
Multi-task, multi-label, metadata-rich. Download links dead (as of 2026-04). Compiled from Rijks + Met + WGA
SemArt
21,383 images
2018
✅
paper , data
Semantic art descriptions paired with images
Dataset
Size
Year
Avail.
Links
Notes
Creative Birds / Creatures
10,000 sketches each
2021
✅
paper , code
Part-annotated creative sketch datasets. ICLR 2021
ImageNet-Sketch
50,000 images, 1,000 classes
2019
✅
code , kaggle
Sketch versions of ImageNet classes for domain shift evaluation
OpenSketch
400+ sketches, 12 objects
2019
✅
paper , website
Product design sketches from professional designers
SketchyScene
29,056 scene sketches
2018
✅
paper (ECCV) , website
Scene-level sketches with instance annotations
Quick, Draw!
50M drawings, 345 categories
2017
✅
website , code , huggingface
Google crowd-sourced sketch game, CC BY 4.0
Sketchy Database
75,471 sketches of 12,500 objects
2016
✅
paper , website
First large-scale paired sketch-photo dataset. SIGGRAPH 2016
TU-Berlin Sketches
20,000 sketches, 250 categories
2012
✅
huggingface
Human sketches for sketch recognition research
Dataset
Size
Year
Avail.
Links
Notes
Van Gogh Authentication
338+ images
2024
✅
paper , data
Originals, human forgeries, and AI-generated fakes
DeepfakeArt Challenge
32,000+ image pairs
2023
✅
paper , kaggle , code
AI art forgery and data poisoning detection benchmark
Generative AI & Diffusion
Dataset
Size
Year
Avail.
Links
Notes
Danbooru2023
5M+ images, 162M+ tags
2023
✅
data (2021) , huggingface (2023)
Crowdsourced anime/illustration, ~30 tags per image
CommonCanvas
~70M CC images
2023
✅
paper , models
Copyright-safe training data for diffusion models. CVPR 2024
TWIGMA
800,000+ images
2023
✅
paper , data
AI-generated images from Twitter with metadata. NeurIPS 2023
Pick-a-Pic
1M+ preference pairs
2023
✅
paper , huggingface
Human preferences for text-to-image, used for RLHF
JourneyDB
4,429,295 images
2023
✅
paper , data , huggingface
Midjourney images with prompts, captions, VQA. NeurIPS 2023
AI-ArtBench
185,015 images
2023
✅
data (IEEE) , kaggle
60K human + 125K AI-generated, real vs AI art detection
Art-fm
650K art images (training set)
2025
📄
paper , code
Flow matching for art generation trained on 650K curated images (WikiArt + 7 museum sources, SSCD-deduped from 950K). LMU Munich, ICCV 2025
SCFlow
N/A (model only)
2025
✅
paper , code
Style/content disentanglement via conditional flow matching in CLIP space. Same lab as Art-fm. ICCV 2025
LAION-Aesthetics
~120M images (score >7)
2022
✅
info , data
Aesthetic-filtered subset of LAION-5B, used to train Stable Diffusion v1
DiffusionDB
14M images, 1.8M prompts
2022
✅
paper , code , huggingface
Stable Diffusion prompt-image pairs from Discord
COYO-700M
747M image-text pairs
2022
✅
data
Large-scale image-text pairs, CC-BY-4.0
Cartoon, Manga & Illustration
Dataset
Size
Year
Avail.
Links
Notes
iCartoonFace
389,678 images, 5,013 characters
2020
📄
paper
Large-scale cartoon face detection and recognition
Manga109
109 volumes, 21,142 pages
2020
📩
paper
Japanese manga with annotated frames, faces, text, and characters
Cultural Heritage & Archaeology
Dataset
Size
Year
Avail.
Links
Notes
CULTURE3D
41,006 drone images, 20 sites
2025
✅
paper , code
Cultural landmarks 3D reconstruction (pyramids, Forbidden City, etc.)
MuralDH
5,000+ images
2024
✅
paper , code
Dunhuang mural restoration: segmentation, inpainting, super-resolution
WikiScenes
landmark photo collections
2021
✅
code
Architectural landmarks with captions and 3D geometry. ICCV 2021
ArchAIDE
435 sketches, 381 photos
2020
📄
paper
Archaeological pottery classification via shape and decoration
Dataset
Size
Year
Avail.
Links
Notes
17K-Graffiti
~17,000 images
2022
✅
code
Graffiti classification. VISAPP 2022
Knowledge Graphs & Metadata
Dataset
Size
Year
Avail.
Links
Notes
PainterPalette
~10,000 painters
2023
✅
code
WikiArt + Art500k + Wikidata painter metadata, network analysis
ArtGraph
135,038 resources, 875,416 facts
2022
✅
paper , data
Knowledge graph linking WikiArt and DBpedia
Museum & Gallery Collections
Open-access collections from cultural institutions. Sorted by collection size (largest first) where size is known, then alphabetically for collections without published counts.
Large Collections (100,000+ objects)
Collection
Size
Avail.
License
Links
Notes
Smithsonian Open Access
11M+ records, 2.8M+ images
✅
CC0
github , website , AWS
19 museums + research centers
Victoria and Albert Museum
1M+ records, 500,000+ images
✅
personal/educational
API
Decorative arts, fashion, design. IIIF, OpenAPI spec
Te Papa (New Zealand)
1M+ objects, 200,000+ downloadable
✅
CC varies
website
First Australasian large-scale open access museum
Rijksmuseum
800,000+ objects
✅
CC0 (public domain works)
data portal
API and bulk downloads, high-resolution images
Louvre
500,000+ works
✅
varies
website , JSON docs
Append .json to any artwork URL. CSV export of searches
The Metropolitan Museum of Art
470,000+ artworks
✅
CC0
github
Comprehensive metadata CSV, regularly updated
iMet Collection
375,000 images
✅
varies
paper , kaggle
Fine-grained attribute recognition challenge
Cooper Hewitt
215,000+ objects
✅
CC0
github , API
Smithsonian design museum, JSON per object
Paris Musees
150,000+ images
✅
CC0
website , API
14 Paris city museums. GraphQL API with free account (session cookie auth). ~8K paintings + ~63K drawings with dates and images
National Gallery of Art (DC)
130,000+ artworks
✅
CC0
github
CSV format with Wikidata identifiers
Medium Collections (10,000-100,000 objects)
Collection
Size
Avail.
License
Links
Notes
SMK (National Gallery of Denmark)
88,000+ works
✅
CC0
API , website
Leading OpenGLAM institution
National Palace Museum (Taiwan)
70,000+ digitized images
✅
open gov
data
Chinese art, calligraphy, ceramics, bronzes. IIIF compliant
Yale Center for British Art
~70,000 IIIF images
✅
public domain
website
Linked Open Data via RDF
Cleveland Museum of Art
61,000+ artworks
✅
CC0
github , API
CSV and JSON data plus public API
Museo del Prado
~40,000 artworks
✅
non-commercial
knowledge graph
Linked Open Data, SPARQL-queryable
Finnish National Gallery
36,000+ artworks
✅
CC0
github
Ateneum, Kiasma, Sinebrychoff collections
Getty Museum
30,000+ high-res images
✅
no known restrictions
website
Open Content Program, IIIF access
Whitney Museum
17,000+ works
✅
CC0
github , API
CSV updated nightly. 20th/21st century American art
Williams College Museum of Art
~15,600 records
✅
CC0
github
CSV format with thumbnails
Walters Art Museum
10,000+ records
✅
CC0
github
Static data files (API v1 retired 2023)
API / Full Collection Access (size unspecified)
Collection
Avail.
License
Links
Notes
Art Institute of Chicago
✅
CC0
github , API , website
REST API + bulk JSON dumps on AWS S3
Harvard Art Museums
✅
varies
API docs , website
REST API refreshed daily, IIIF-compatible
Minneapolis Institute of Art
✅
CC0
github
JSON metadata, updated approximately daily
Brooklyn Museum
📩
varies
API , examples
REST API, registration required
British Museum
✅
non-commercial
website , github
SPARQL/Linked Data. Export up to 10K records per search
National Gallery (London)
✅
CC BY-NC-ND 4.0
API
~2,300 paintings. Elasticsearch-based API
ColBase (National Museums of Japan)
✅
varies
website
Tokyo, Kyoto, Nara, Kyushu national museums
Science Museum Group (UK)
✅
CC varies
API , github
JSON/CSV exports, 37 GitHub repos
Auckland War Memorial Museum
✅
CC varies
API
API plus Linked Open Data
QAGOMA (Queensland, Australia)
✅
CC varies
data
CSV, refreshed monthly. Australian and Asia-Pacific art
Metadata Only / Limited Access
Collection
Avail.
License
Links
Notes
MoMA Collection
✅
varies
github
Artist, artwork, exhibition data. Artworks.csv (via LFS) includes ImageURL column with direct JPEG links for ~64K works
Carnegie Museum of Art
✅
CC0
github
Pittsburgh collection metadata
The Tate Collection
✅
CC-BY-NC-ND 3.0
github
Metadata CSV includes thumbnail URLs. Images downloadable via CDN (swap _8.jpg for _10.jpg for 1536px). ~38K paintings/drawings with dates
Nationalmuseum Sweden
✅
CC0
github
Wikidata-linked metadata
ArtUK
🔒
restricted
website
UK public art, browsable only. MDS extract API exists but aggressive bot protection (403 on all programmatic access, including Selenium, as of 2026-04)
Aggregated & Cross-Institution Datasets
Dataset
Size
Avail.
License
Links
Notes
Europeana
50M+ cultural heritage objects
✅
varies
data
Aggregator across thousands of European institutions
Wikidata: Sum of All Paintings
hundreds of thousands of paintings
✅
CC0
project
Structured data linking paintings across museums. SPARQL queryable
art-museums-pd-440k
~440,000 images
✅
CC-BY-4.0
huggingface
Public domain museum art with bilingual captions, WebDataset format