Skip to content

j23n/photo-tools

Repository files navigation

photo-tools

Auto-tag photos and manage image metadata using RAM++, CLIP landmark lookup, PaddleOCR, GPS reverse geocoding, and EXIF analysis. Writes IPTC/XMP keywords with DigiKam hierarchical tag support.

Features

  • RAM++ tagging — Recognize Anything Plus Plus for multi-label image content recognition. Mapped to Objects/ and Scenes/ keywords with per-category confidence and count limits.
  • Landmark lookup — CLIP embeddings + GPS radius filtering against a FAISS-indexed Wikidata landmark database. Emits a single Landmarks/<Name> keyword.
  • OCR text detection — PaddleOCR extracts visible text and writes it to photo-tools:OCRText (custom XMP namespace) with IPTC ImageRegion metadata. Keeps OCR phrases out of the keyword/tag tree.
  • GPS reverse geocoding — Nominatim turns coordinates into a single nested Places/<Country>[/<Region>[/<City>[/<Neighborhood>]]] keyword and the IPTC-standard structured location fields (XMP-photoshop:City/State/Country, XMP-iptcCore:CountryCode/Location, plus IPTC IIM mirrors), so the data shows up in the Location panel of every IPTC-aware DAM (Lightroom, Bridge, Capture One, Photo Mechanic, Mylio, ACDSee, …).
  • GPS timeline inference — Images without GPS inherit coordinates from nearby photos taken within 30 minutes.
  • Visual duplicate detection — Find visually similar images using CLIP embeddings with an interactive dedup session. Supports undoing previous moves via a sidecar manifest.
  • Tag management — List, search, delete, rename, clear, and inspect tags across image collections.
  • Landmark database builder — Scrape Wikidata for notable landmarks and build a CLIP embedding index.
  • Watch mode — Monitor a directory and auto-tag new images as they appear.
  • XMP sidecars (optional) — pass --xmp-sidecars to redirect every metadata write into a sibling IMG_1234.jpg.xmp (the image file itself is left untouched) and merge the sidecar back in on read. Off by default. Reads also fall back to the Lightroom / Capture One sidecar form (IMG_1234.xmp) when the canonical form is absent. See docs/xmp-schema.md §1.4.

Requirements

  • Python 3.12+
  • uv — used for dependency management. Falls back to pip-installable, but uv sync is the supported path.
  • exiftool on PATH — every metadata read and write goes through it.
  • ffmpeg on PATH — only required for video frame extraction (the tag pipeline samples one frame per video for RAM++/CLIP/OCR).
  • git on PATH — one dependency (recognize-anything) is fetched from a Git source at install time.
  • ~5 GB free disk for first-run model downloads (RAM++ Swin-Large, OpenCLIP ViT-B-32, PaddleOCR mobile models) plus whatever the Wikidata landmark index ends up at (a few hundred MB for ~5000 landmarks).
  • Runs CPU-only by default; CUDA / Apple Silicon MPS are picked up automatically when available.

Quickstart

# Install (requires exiftool, ffmpeg, and git on PATH)
uv sync

# Auto-tag a directory of photos (always recurses into subdirectories)
photo-tools tag /path/to/photos

# Dry run — preview tags without writing
photo-tools tag /path/to/photos --dry-run

# Run specific pipelines only
photo-tools tag /path/to/photos --ram          # content tags (RAM++)
photo-tools tag /path/to/photos --ocr          # text detection (PaddleOCR)
photo-tools tag /path/to/photos --landmarks    # landmark lookup (CLIP + FAISS)

# Re-tag — clears previous autogenerated tags first
photo-tools tag /path/to/photos --force

# Wipe ALL keywords + photo-tools namespace before re-tagging (nuclear option)
photo-tools tag /path/to/photos --clear-all

# Watch for new images and auto-tag them
photo-tools tag /path/to/photos --watch

# Fill in missing pipeline outputs per-photo (geocoding 404'd, OCR never ran,
# digiKam stripped tags, etc.). Detects what's missing from the metadata.
photo-tools tag /path/to/photos --fix
photo-tools tag /path/to/photos --fix --ocr    # only consider OCR

# Send writes to IMG_1234.jpg.xmp sidecars instead of the image (off by default)
photo-tools tag /path/to/photos --xmp-sidecars

# Find visual duplicates (interactive picker; supports undo via dest sidecar)
photo-tools duplicates /path/to/photos

# Tag management
photo-tools tags list /path/to/photos                   # list all tags with counts
photo-tools tags search /path/to/photos "Objects/Animal/Cat"   # show files with this tag
photo-tools tags delete /path/to/photos "old-tag"       # remove a tag everywhere
photo-tools tags rename /path/to/photos "old" "new"     # rename across collection
photo-tools tags clear /path/to/photos                  # wipe tags (keeps People/* + face regions; prompts before writing)
photo-tools tags inspect /path/to/photos                # interactively view images with their tags, GPS, and date

# Landmark database tools
photo-tools landmarks generate-db --test       # small test set (Rome + Bologna)
photo-tools landmarks generate-db -l 5000      # full build, 5000 landmarks
photo-tools landmarks query /path/to/image.jpg # top-10 nearest landmarks (debug)

# Global flags
photo-tools -v tag /path/to/photos                # verbose logging
photo-tools --config overrides.yaml tag /photos   # custom config overlay
photo-tools --xmp-sidecars tag /photos            # send writes to .xmp sidecars (image untouched)

Compatibility with other photo managers

photo-tools writes industry-standard XMP/IPTC metadata. It is digiKam-shaped (taxonomy and digiKam:TagsList round-trip cleanly), but the keyword data, location data, and people projection are also surfaced by every major keyword-aware DAM. See docs/xmp-schema.md §4 for the full interop matrix; the headline summary:

App What surfaces
digiKam Excellent — full hierarchy, People view, Location panel
Lightroom Classic / Bridge Keyword hierarchy (via lr:HierarchicalSubject) and Location panel (via XMP-photoshop:City/State/Country)
Photo Mechanic / Mylio Keywords, IPTC Location, PersonInImage
Capture One / ACDSee / Excire / XnView Flat keywords + IPTC Location
PhotoPrism / Immich Tags + (location reading varies)
Apple Photos Flat keywords on first import only — no re-sync
Google Photos None — Google strips metadata on upload

OCR text (photo-tools:OCRText) is in a tool-private namespace and only photo-tools / exiftool will surface it. Face regions (XMP-mwg-rs) are intentionally out-of-scope: photo-tools doesn't do face detection, so it defers that lane to digiKam / Lightroom / Mylio.

Architecture

src/photo_tools/
  cli.py                  CLI entry point — all subcommands
  config.py               YAML config loader (defaults + user overlay)
  default_config.yaml     All configurable thresholds and parameters
  constants.py            Tag roots, file extensions, patterns
  taxonomy.py             Tag categories and per-category limits
  logging_setup.py        Logging configuration

  autotag.py              Core tagging engine (RAM++, GPS, OCR, EXIF, landmarks)
  helpers.py              ExifTool ops, file discovery, image prep, embedding cache
  clip_tagger.py          CLIP image embedder (ViT-B-32, OpenCLIP)
  ram_tagger.py           RAM++ image tagger (Swin-Large)
  landmarks.py            FAISS-backed landmark lookup with GPS radius filtering
  duplicates.py           Visual duplicate detection + interactive picker (with undo)
  tags_cmd.py             `tags` subcommands: list/search/delete/rename/clear/inspect
  debug_viewer.py         Interactive image inspector with metadata display

  build_landmarks.py      Wikidata landmark scraper + CLIP index builder

  exiftool_phototools.config   ExifTool XMP namespace config
  data/ram_tag_mapping.yaml    RAM++ tag → taxonomy mapping (~4580 entries)

All thresholds and parameters live in default_config.yaml and can be overridden with --config path/to/overrides.yaml. The XMP/IPTC schema photo-tools writes — including the photo-tools: custom namespace — is documented in docs/xmp-schema.md.

Development

# Install with dev dependencies
uv sync

# Run tests
uv run pytest

# Lint
uv run ruff check .
uv run ruff format --check .

See CONTRIBUTING.md for the contribution workflow.

License

Licensed under the Apache License, Version 2.0.

Bundled and downloaded models carry their own licenses:

  • RAM++ (ram_plus_swin_large_14m.pth) — Apache-2.0 (recognize-anything)
  • OpenCLIP ViT-B-32 / laion2b_s34b_b79k — MIT (open_clip)
  • PaddleOCR PP-OCRv5 mobile — Apache-2.0 (PaddleOCR)
  • Wikidata content (used by the landmark scraper) — CC0; landmark images on Wikimedia Commons carry per-file licenses.

About

Assorted photo library management tools

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages