photo-tools

Auto-tag photos and manage image metadata using RAM++, CLIP landmark lookup, PaddleOCR, GPS reverse geocoding, and EXIF analysis. Writes IPTC/XMP keywords with DigiKam hierarchical tag support.

Features

RAM++ tagging — Recognize Anything Plus Plus for multi-label image content recognition. Mapped to Objects/ and Scenes/ keywords with per-category confidence and count limits.
Landmark lookup — CLIP embeddings + GPS radius filtering against a FAISS-indexed Wikidata landmark database. Emits a single Landmarks/<Name> keyword.
OCR text detection — PaddleOCR extracts visible text and writes it to photo-tools:OCRText (custom XMP namespace) with IPTC ImageRegion metadata. Keeps OCR phrases out of the keyword/tag tree.
GPS reverse geocoding — Nominatim turns coordinates into a single nested Places/<Country>[/<Region>[/<City>[/<Neighborhood>]]] keyword and the IPTC-standard structured location fields (XMP-photoshop:City/State/Country, XMP-iptcCore:CountryCode/Location, plus IPTC IIM mirrors), so the data shows up in the Location panel of every IPTC-aware DAM (Lightroom, Bridge, Capture One, Photo Mechanic, Mylio, ACDSee, …).
GPS timeline inference — Images without GPS inherit coordinates from nearby photos taken within 30 minutes.
Visual duplicate detection — Find visually similar images using CLIP embeddings with an interactive dedup session. Supports undoing previous moves via a sidecar manifest.
Tag management — List, search, delete, rename, clear, and inspect tags across image collections.
Landmark database builder — Scrape Wikidata for notable landmarks and build a CLIP embedding index.
Watch mode — Monitor a directory and auto-tag new images as they appear.
XMP sidecars (optional) — pass --xmp-sidecars to redirect every metadata write into a sibling IMG_1234.jpg.xmp (the image file itself is left untouched) and merge the sidecar back in on read. Off by default. Reads also fall back to the Lightroom / Capture One sidecar form (IMG_1234.xmp) when the canonical form is absent. See docs/xmp-schema.md §1.4.

Requirements

Python 3.12+
uv — used for dependency management. Falls back to pip-installable, but uv sync is the supported path.
exiftool on PATH — every metadata read and write goes through it.
ffmpeg on PATH — only required for video frame extraction (the tag pipeline samples one frame per video for RAM++/CLIP/OCR).
git on PATH — one dependency (recognize-anything) is fetched from a Git source at install time.
~5 GB free disk for first-run model downloads (RAM++ Swin-Large, OpenCLIP ViT-B-32, PaddleOCR mobile models) plus whatever the Wikidata landmark index ends up at (a few hundred MB for ~5000 landmarks).
Runs CPU-only by default; CUDA / Apple Silicon MPS are picked up automatically when available.

Quickstart

# Install (requires exiftool, ffmpeg, and git on PATH)
uv sync

# Auto-tag a directory of photos (always recurses into subdirectories)
photo-tools tag /path/to/photos

# Dry run — preview tags without writing
photo-tools tag /path/to/photos --dry-run

# Run specific pipelines only
photo-tools tag /path/to/photos --ram          # content tags (RAM++)
photo-tools tag /path/to/photos --ocr          # text detection (PaddleOCR)
photo-tools tag /path/to/photos --landmarks    # landmark lookup (CLIP + FAISS)

# Re-tag — clears previous autogenerated tags first
photo-tools tag /path/to/photos --force

# Wipe ALL keywords + photo-tools namespace before re-tagging (nuclear option)
photo-tools tag /path/to/photos --clear-all

# Watch for new images and auto-tag them
photo-tools tag /path/to/photos --watch

# Fill in missing pipeline outputs per-photo (geocoding 404'd, OCR never ran,
# digiKam stripped tags, etc.). Detects what's missing from the metadata.
photo-tools tag /path/to/photos --fix
photo-tools tag /path/to/photos --fix --ocr    # only consider OCR

# Send writes to IMG_1234.jpg.xmp sidecars instead of the image (off by default)
photo-tools tag /path/to/photos --xmp-sidecars

# Find visual duplicates (interactive picker; supports undo via dest sidecar)
photo-tools duplicates /path/to/photos

# Tag management
photo-tools tags list /path/to/photos                   # list all tags with counts
photo-tools tags search /path/to/photos "Objects/Animal/Cat"   # show files with this tag
photo-tools tags delete /path/to/photos "old-tag"       # remove a tag everywhere
photo-tools tags rename /path/to/photos "old" "new"     # rename across collection
photo-tools tags clear /path/to/photos                  # wipe tags (keeps People/* + face regions; prompts before writing)
photo-tools tags inspect /path/to/photos                # interactively view images with their tags, GPS, and date

# Landmark database tools
photo-tools landmarks generate-db --test       # small test set (Rome + Bologna)
photo-tools landmarks generate-db -l 5000      # full build, 5000 landmarks
photo-tools landmarks query /path/to/image.jpg # top-10 nearest landmarks (debug)

# Global flags
photo-tools -v tag /path/to/photos                # verbose logging
photo-tools --config overrides.yaml tag /photos   # custom config overlay
photo-tools --xmp-sidecars tag /photos            # send writes to .xmp sidecars (image untouched)

Compatibility with other photo managers

photo-tools writes industry-standard XMP/IPTC metadata. It is digiKam-shaped (taxonomy and digiKam:TagsList round-trip cleanly), but the keyword data, location data, and people projection are also surfaced by every major keyword-aware DAM. See docs/xmp-schema.md §4 for the full interop matrix; the headline summary:

App	What surfaces
digiKam	Excellent — full hierarchy, People view, Location panel
Lightroom Classic / Bridge	Keyword hierarchy (via `lr:HierarchicalSubject`) and Location panel (via `XMP-photoshop:City/State/Country`)
Photo Mechanic / Mylio	Keywords, IPTC Location, `PersonInImage`
Capture One / ACDSee / Excire / XnView	Flat keywords + IPTC Location
PhotoPrism / Immich	Tags + (location reading varies)
Apple Photos	Flat keywords on first import only — no re-sync
Google Photos	None — Google strips metadata on upload

OCR text (photo-tools:OCRText) is in a tool-private namespace and only photo-tools / exiftool will surface it. Face regions (XMP-mwg-rs) are intentionally out-of-scope: photo-tools doesn't do face detection, so it defers that lane to digiKam / Lightroom / Mylio.

Architecture

src/photo_tools/
  cli.py                  CLI entry point — all subcommands
  config.py               YAML config loader (defaults + user overlay)
  default_config.yaml     All configurable thresholds and parameters
  constants.py            Tag roots, file extensions, patterns
  taxonomy.py             Tag categories and per-category limits
  logging_setup.py        Logging configuration

  autotag.py              Core tagging engine (RAM++, GPS, OCR, EXIF, landmarks)
  helpers.py              ExifTool ops, file discovery, image prep, embedding cache
  clip_tagger.py          CLIP image embedder (ViT-B-32, OpenCLIP)
  ram_tagger.py           RAM++ image tagger (Swin-Large)
  landmarks.py            FAISS-backed landmark lookup with GPS radius filtering
  duplicates.py           Visual duplicate detection + interactive picker (with undo)
  tags_cmd.py             `tags` subcommands: list/search/delete/rename/clear/inspect
  debug_viewer.py         Interactive image inspector with metadata display

  build_landmarks.py      Wikidata landmark scraper + CLIP index builder

  exiftool_phototools.config   ExifTool XMP namespace config
  data/ram_tag_mapping.yaml    RAM++ tag → taxonomy mapping (~4580 entries)

All thresholds and parameters live in default_config.yaml and can be overridden with --config path/to/overrides.yaml. The XMP/IPTC schema photo-tools writes — including the photo-tools: custom namespace — is documented in docs/xmp-schema.md.

Development

# Install with dev dependencies
uv sync

# Run tests
uv run pytest

# Lint
uv run ruff check .
uv run ruff format --check .

See CONTRIBUTING.md for the contribution workflow.

License

Licensed under the Apache License, Version 2.0.

Bundled and downloaded models carry their own licenses:

RAM++ (ram_plus_swin_large_14m.pth) — Apache-2.0 (recognize-anything)
OpenCLIP ViT-B-32 / laion2b_s34b_b79k — MIT (open_clip)
PaddleOCR PP-OCRv5 mobile — Apache-2.0 (PaddleOCR)
Wikidata content (used by the landmark scraper) — CC0; landmark images on Wikimedia Commons carry per-file licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.github/workflows		.github/workflows
docs		docs
scripts		scripts
src/photo_tools		src/photo_tools
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

photo-tools

Features

Requirements

Quickstart

Compatibility with other photo managers

Architecture

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

photo-tools

Features

Requirements

Quickstart

Compatibility with other photo managers

Architecture

Development

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages