Auto-tag photos and manage image metadata using RAM++, CLIP landmark lookup, PaddleOCR, GPS reverse geocoding, and EXIF analysis. Writes IPTC/XMP keywords with DigiKam hierarchical tag support.
- RAM++ tagging — Recognize Anything Plus Plus for multi-label image content recognition. Mapped to
Objects/andScenes/keywords with per-category confidence and count limits. - Landmark lookup — CLIP embeddings + GPS radius filtering against a FAISS-indexed Wikidata landmark database. Emits a single
Landmarks/<Name>keyword. - OCR text detection — PaddleOCR extracts visible text and writes it to
photo-tools:OCRText(custom XMP namespace) with IPTC ImageRegion metadata. Keeps OCR phrases out of the keyword/tag tree. - GPS reverse geocoding — Nominatim turns coordinates into a single nested
Places/<Country>[/<Region>[/<City>[/<Neighborhood>]]]keyword and the IPTC-standard structured location fields (XMP-photoshop:City/State/Country,XMP-iptcCore:CountryCode/Location, plus IPTC IIM mirrors), so the data shows up in the Location panel of every IPTC-aware DAM (Lightroom, Bridge, Capture One, Photo Mechanic, Mylio, ACDSee, …). - GPS timeline inference — Images without GPS inherit coordinates from nearby photos taken within 30 minutes.
- Visual duplicate detection — Find visually similar images using CLIP embeddings with an interactive dedup session. Supports undoing previous moves via a sidecar manifest.
- Tag management — List, search, delete, rename, clear, and inspect tags across image collections.
- Landmark database builder — Scrape Wikidata for notable landmarks and build a CLIP embedding index.
- Watch mode — Monitor a directory and auto-tag new images as they appear.
- XMP sidecars (optional) — pass
--xmp-sidecarsto redirect every metadata write into a siblingIMG_1234.jpg.xmp(the image file itself is left untouched) and merge the sidecar back in on read. Off by default. Reads also fall back to the Lightroom / Capture One sidecar form (IMG_1234.xmp) when the canonical form is absent. Seedocs/xmp-schema.md§1.4.
- Python 3.12+
uv— used for dependency management. Falls back to pip-installable, butuv syncis the supported path.- exiftool on
PATH— every metadata read and write goes through it. - ffmpeg on
PATH— only required for video frame extraction (thetagpipeline samples one frame per video for RAM++/CLIP/OCR). gitonPATH— one dependency (recognize-anything) is fetched from a Git source at install time.- ~5 GB free disk for first-run model downloads (RAM++ Swin-Large, OpenCLIP ViT-B-32, PaddleOCR mobile models) plus whatever the Wikidata landmark index ends up at (a few hundred MB for ~5000 landmarks).
- Runs CPU-only by default; CUDA / Apple Silicon MPS are picked up automatically when available.
# Install (requires exiftool, ffmpeg, and git on PATH)
uv sync
# Auto-tag a directory of photos (always recurses into subdirectories)
photo-tools tag /path/to/photos
# Dry run — preview tags without writing
photo-tools tag /path/to/photos --dry-run
# Run specific pipelines only
photo-tools tag /path/to/photos --ram # content tags (RAM++)
photo-tools tag /path/to/photos --ocr # text detection (PaddleOCR)
photo-tools tag /path/to/photos --landmarks # landmark lookup (CLIP + FAISS)
# Re-tag — clears previous autogenerated tags first
photo-tools tag /path/to/photos --force
# Wipe ALL keywords + photo-tools namespace before re-tagging (nuclear option)
photo-tools tag /path/to/photos --clear-all
# Watch for new images and auto-tag them
photo-tools tag /path/to/photos --watch
# Fill in missing pipeline outputs per-photo (geocoding 404'd, OCR never ran,
# digiKam stripped tags, etc.). Detects what's missing from the metadata.
photo-tools tag /path/to/photos --fix
photo-tools tag /path/to/photos --fix --ocr # only consider OCR
# Send writes to IMG_1234.jpg.xmp sidecars instead of the image (off by default)
photo-tools tag /path/to/photos --xmp-sidecars
# Find visual duplicates (interactive picker; supports undo via dest sidecar)
photo-tools duplicates /path/to/photos
# Tag management
photo-tools tags list /path/to/photos # list all tags with counts
photo-tools tags search /path/to/photos "Objects/Animal/Cat" # show files with this tag
photo-tools tags delete /path/to/photos "old-tag" # remove a tag everywhere
photo-tools tags rename /path/to/photos "old" "new" # rename across collection
photo-tools tags clear /path/to/photos # wipe tags (keeps People/* + face regions; prompts before writing)
photo-tools tags inspect /path/to/photos # interactively view images with their tags, GPS, and date
# Landmark database tools
photo-tools landmarks generate-db --test # small test set (Rome + Bologna)
photo-tools landmarks generate-db -l 5000 # full build, 5000 landmarks
photo-tools landmarks query /path/to/image.jpg # top-10 nearest landmarks (debug)
# Global flags
photo-tools -v tag /path/to/photos # verbose logging
photo-tools --config overrides.yaml tag /photos # custom config overlay
photo-tools --xmp-sidecars tag /photos # send writes to .xmp sidecars (image untouched)photo-tools writes industry-standard XMP/IPTC metadata. It is digiKam-shaped
(taxonomy and digiKam:TagsList round-trip cleanly), but the keyword data,
location data, and people projection are also surfaced by every major
keyword-aware DAM. See docs/xmp-schema.md §4 for the
full interop matrix; the headline summary:
| App | What surfaces |
|---|---|
| digiKam | Excellent — full hierarchy, People view, Location panel |
| Lightroom Classic / Bridge | Keyword hierarchy (via lr:HierarchicalSubject) and Location panel (via XMP-photoshop:City/State/Country) |
| Photo Mechanic / Mylio | Keywords, IPTC Location, PersonInImage |
| Capture One / ACDSee / Excire / XnView | Flat keywords + IPTC Location |
| PhotoPrism / Immich | Tags + (location reading varies) |
| Apple Photos | Flat keywords on first import only — no re-sync |
| Google Photos | None — Google strips metadata on upload |
OCR text (photo-tools:OCRText) is in a tool-private namespace and only
photo-tools / exiftool will surface it. Face regions (XMP-mwg-rs) are
intentionally out-of-scope: photo-tools doesn't do face detection, so it
defers that lane to digiKam / Lightroom / Mylio.
src/photo_tools/
cli.py CLI entry point — all subcommands
config.py YAML config loader (defaults + user overlay)
default_config.yaml All configurable thresholds and parameters
constants.py Tag roots, file extensions, patterns
taxonomy.py Tag categories and per-category limits
logging_setup.py Logging configuration
autotag.py Core tagging engine (RAM++, GPS, OCR, EXIF, landmarks)
helpers.py ExifTool ops, file discovery, image prep, embedding cache
clip_tagger.py CLIP image embedder (ViT-B-32, OpenCLIP)
ram_tagger.py RAM++ image tagger (Swin-Large)
landmarks.py FAISS-backed landmark lookup with GPS radius filtering
duplicates.py Visual duplicate detection + interactive picker (with undo)
tags_cmd.py `tags` subcommands: list/search/delete/rename/clear/inspect
debug_viewer.py Interactive image inspector with metadata display
build_landmarks.py Wikidata landmark scraper + CLIP index builder
exiftool_phototools.config ExifTool XMP namespace config
data/ram_tag_mapping.yaml RAM++ tag → taxonomy mapping (~4580 entries)
All thresholds and parameters live in default_config.yaml and can be overridden with --config path/to/overrides.yaml. The XMP/IPTC schema photo-tools writes — including the photo-tools: custom namespace — is documented in docs/xmp-schema.md.
# Install with dev dependencies
uv sync
# Run tests
uv run pytest
# Lint
uv run ruff check .
uv run ruff format --check .See CONTRIBUTING.md for the contribution workflow.
Licensed under the Apache License, Version 2.0.
Bundled and downloaded models carry their own licenses:
- RAM++ (
ram_plus_swin_large_14m.pth) — Apache-2.0 (recognize-anything) - OpenCLIP ViT-B-32 / laion2b_s34b_b79k — MIT (open_clip)
- PaddleOCR PP-OCRv5 mobile — Apache-2.0 (PaddleOCR)
- Wikidata content (used by the landmark scraper) — CC0; landmark images on Wikimedia Commons carry per-file licenses.