Skip to content

Latest commit

 

History

History
1575 lines (1300 loc) · 51.1 KB

File metadata and controls

1575 lines (1300 loc) · 51.1 KB

Configuration Reference

All settings are in scoring_config.json. After modifying, run python facet.py --recompute-average to update scores (no GPU needed).

Table of Contents


Users

Optional multi-user support for family NAS scenarios. When the users key is present (with at least one user), multi-user mode is enabled and the single-password auth is replaced with per-user login.

{
  "users": {
    "alice": {
      "password_hash": "salt_hex:dk_hex",
      "display_name": "Alice",
      "role": "superadmin",
      "directories": ["/volume1/Photos/Alice"]
    },
    "bob": {
      "password_hash": "salt_hex:dk_hex",
      "display_name": "Bob",
      "role": "user",
      "directories": ["/volume1/Photos/Bob"]
    },
    "shared_directories": [
      "/volume1/Photos/Family",
      "/volume1/Photos/Vacations"
    ]
  }
}

User fields

Field Type Description
password_hash string PBKDF2-HMAC-SHA256 hash (salt_hex:dk_hex). Generated by --add-user CLI.
display_name string Shown in the UI header
role string user, admin, or superadmin
directories array Private photo directories for this user

Shared directories

The shared_directories key (sibling of user objects) lists directories visible to all users.

Roles

Role View own + shared Rate/favorite Manage persons/faces Trigger scans
user yes yes no no
admin yes yes yes no
superadmin yes yes yes yes

Adding users

Users are created via CLI only — there is no registration UI or API:

python database.py --add-user alice --role superadmin --display-name "Alice"
# Prompts for password, writes hash to scoring_config.json

After adding a user, edit scoring_config.json to configure their directories.

Backward compatibility

  • No users key = legacy single-user mode (unchanged behavior)
  • viewer.password and viewer.edition_password are ignored in multi-user mode
  • Existing ratings in the photos table remain for single-user mode; use --migrate-user-preferences to copy them

Scanning

Controls directory scanning behavior.

{
  "scanning": {
    "skip_hidden_directories": true
  }
}
Setting Default Description
skip_hidden_directories true Skip directories starting with . during photo scanning

Categories

Array of category definitions. See Scoring for detailed category documentation.

Each category has:

  • name - Category identifier
  • priority - Lower = higher priority (evaluated first)
  • filters - Conditions for matching
  • weights - Scoring metric weights (must sum to 100)
  • modifiers - Behavior adjustments
  • tags - CLIP vocabulary for tag-based matching

Scoring

{
  "scoring": {
    "score_min": 0.0,
    "score_max": 10.0,
    "score_precision": 2
  }
}
Setting Default Description
score_min 0.0 Minimum possible score
score_max 10.0 Maximum possible score
score_precision 2 Decimal places for scores

Thresholds

Detection thresholds for automatic categorization.

{
  "thresholds": {
    "portrait_face_ratio_percent": 5,
    "blink_penalty_percent": 50,
    "night_luminance_threshold": 0.15,
    "night_iso_threshold": 3200,
    "long_exposure_shutter_threshold": 1.0,
    "astro_shutter_threshold": 10.0
  }
}
Setting Default Description
portrait_face_ratio_percent 5 Face > 5% of frame = portrait
blink_penalty_percent 50 Score multiplier when blink detected (0.5x)
night_luminance_threshold 0.15 Mean luminance below this = night
night_iso_threshold 3200 ISO above this = low-light
long_exposure_shutter_threshold 1.0 Shutter > 1s = long exposure
astro_shutter_threshold 10.0 Shutter > 10s = astrophotography

Composition

Rule-based composition scoring (used when SAMP-Net is not active).

{
  "composition": {
    "power_point_weight": 2.0,
    "line_weight": 1.0
  }
}
Setting Default Description
power_point_weight 2.0 Weight for rule-of-thirds placement
line_weight 1.0 Weight for leading lines

EXIF Adjustments

Automatic scoring adjustments based on camera settings.

{
  "exif_adjustments": {
    "iso_sharpness_compensation": true,
    "aperture_isolation_boost": true
  }
}
Setting Default Description
iso_sharpness_compensation true Reduce sharpness penalty for high-ISO
aperture_isolation_boost true Boost isolation for wide apertures (f/1.4-f/2.8)

Exposure

Controls exposure analysis and clipping detection.

{
  "exposure": {
    "shadow_clip_threshold_percent": 15,
    "highlight_clip_threshold_percent": 10,
    "silhouette_detection": true
  }
}
Setting Default Description
shadow_clip_threshold_percent 15 Flag if > 15% pixels pure black
highlight_clip_threshold_percent 10 Flag if > 10% pixels pure white
silhouette_detection true Detect intentional silhouettes

Penalties

Score penalties for technical issues.

{
  "penalties": {
    "noise_sigma_threshold": 4.0,
    "noise_max_penalty_points": 1.5,
    "noise_penalty_per_sigma": 0.3,
    "bimodality_threshold": 2.5,
    "bimodality_penalty_points": 0.5,
    "leading_lines_blend_percent": 30,
    "oversaturation_threshold": 0.9,
    "oversaturation_pixel_percent": 5,
    "oversaturation_penalty_points": 0.5
  }
}
Setting Default Description
noise_sigma_threshold 4.0 Noise above this triggers penalty
noise_max_penalty_points 1.5 Maximum noise penalty
noise_penalty_per_sigma 0.3 Points per sigma above threshold
bimodality_threshold 2.5 Histogram bimodality coefficient
bimodality_penalty_points 0.5 Penalty for bimodal histograms
leading_lines_blend_percent 30 Blend into comp_score
oversaturation_threshold 0.9 Mean saturation threshold
oversaturation_pixel_percent 5 Reserved for pixel-level detection
oversaturation_penalty_points 0.5 Oversaturation penalty

Noise penalty formula:

penalty = min(noise_max_penalty_points, (noise_sigma - threshold) * noise_penalty_per_sigma)

Normalization

Controls how raw metrics are scaled to 0-10 scores.

{
  "normalization": {
    "method": "percentile",
    "percentile_target": 90,
    "per_category": true,
    "category_min_samples": 50
  }
}
Setting Default Description
method "percentile" Normalization method
percentile_target 90 90th percentile = score of 10.0
per_category true Category-specific normalization
category_min_samples 50 Minimum photos for per-category

Models

Controls which AI models are used based on VRAM.

{
  "models": {
    "vram_profile": "auto",
    "keep_in_ram": "auto",
    "profiles": {
      "legacy": {
        "aesthetic_model": "clip-mlp",
        "clip_config": "clip_legacy",
        "composition_model": "samp-net",
        "tagging_model": "clip",
        "description": "CPU-optimized: CLIP-MLP aesthetic + SAMP-Net composition + CLIP tagging (8GB+ RAM)"
      },
      "8gb": {
        "aesthetic_model": "clip-mlp",
        "clip_config": "clip_legacy",
        "composition_model": "samp-net",
        "tagging_model": "clip",
        "description": "CLIP-MLP aesthetic + SAMP-Net composition + CLIP tagging (6-14GB VRAM)"
      },
      "16gb": {
        "aesthetic_model": "topiq",
        "clip_config": "clip",
        "composition_model": "samp-net",
        "tagging_model": "qwen3-vl-2b",
        "description": "TOPIQ aesthetic + SigLIP 2 embeddings + SAMP-Net composition (~14GB VRAM)"
      },
      "24gb": {
        "aesthetic_model": "topiq",
        "clip_config": "clip",
        "composition_model": "qwen2-vl-2b",
        "tagging_model": "qwen2.5-vl-7b",
        "description": "TOPIQ aesthetic + SigLIP 2 embeddings + Qwen2-VL composition (~18GB VRAM)"
      }
    },
    "qwen2_vl": {
      "model_path": "Qwen/Qwen2-VL-2B-Instruct",
      "torch_dtype": "bfloat16",
      "max_new_tokens": 256
    },
    "qwen2_5_vl_7b": {
      "model_path": "Qwen/Qwen2.5-VL-7B-Instruct",
      "torch_dtype": "bfloat16",
      "vlm_batch_size": 2
    },
    "clip": {
      "model_name": "ViT-SO400M-16-SigLIP2-384",
      "pretrained": "webli",
      "embedding_dim": 1152,
      "similarity_threshold_percent": 18,
      "backend": "transformers"
    },
    "clip_legacy": {
      "model_name": "ViT-L-14",
      "pretrained": "laion2b_s32b_b82k",
      "embedding_dim": 768,
      "similarity_threshold_percent": 22
    },
    "florence_2_large": {
      "model_path": "MiaoshouAI/Florence-2-large-PromptGen-v2.0",
      "torch_dtype": "float16",
      "vlm_batch_size": 4,
      "max_new_tokens": 256
    },
    "supplementary_pyiqa": ["topiq_iaa", "topiq_nr_face", "liqe"],
    "saliency": {
      "enabled": false,
      "description": "BiRefNet_dynamic subject saliency detection (~2 GB VRAM)"
    },
    "samp_net": {
      "model_path": "pretrained_models/samp_net.pth",
      "download_url": "https://github.com/bcmi/Image-Composition-Assessment-with-SAMP/releases/download/v1.0/samp_net.pth",
      "input_size": 384,
      "patterns": [
        "none", "center", "rule_of_thirds", "golden_ratio", "triangle",
        "horizontal", "vertical", "diagonal", "symmetric", "curved",
        "radial", "vanishing_point", "pattern", "fill_frame"
      ]
    }
  }
}
Setting Default Description
vram_profile "auto" Active profile (auto, legacy, 8gb, 16gb, 24gb)
keep_in_ram "auto" Keep models in RAM between multi-pass chunks ("auto", "always", "never"). auto checks available RAM before caching. Reduces model load time on subsequent chunks.
qwen2_vl.model_path "Qwen/Qwen2-VL-2B-Instruct" HuggingFace model path
qwen2_vl.torch_dtype "bfloat16" Precision
qwen2_vl.max_new_tokens 256 Max generation tokens
qwen2_5_vl_7b.model_path "Qwen/Qwen2.5-VL-7B-Instruct" HuggingFace model path for VLM tagging
qwen2_5_vl_7b.torch_dtype "bfloat16" Precision
qwen2_5_vl_7b.vlm_batch_size 2 Images per VLM inference batch
qwen3_vl_2b.model_path "Qwen/Qwen3-VL-2B-Instruct" HuggingFace model path for Qwen3-VL tagging
qwen3_vl_2b.torch_dtype "bfloat16" Precision
qwen3_vl_2b.max_new_tokens 100 Max generation tokens
qwen3_vl_2b.vlm_batch_size 4 Images per VLM inference batch
qwen3_5_2b.model_path "Qwen/Qwen3.5-2B" HuggingFace model path for Qwen3.5 tagging (16gb default)
qwen3_5_2b.vlm_batch_size 4 Images per VLM inference batch
qwen3_5_4b.model_path "Qwen/Qwen3.5-4B" HuggingFace model path for Qwen3.5 tagging (24gb default)
qwen3_5_4b.vlm_batch_size 2 Images per VLM inference batch
clip.model_name "ViT-SO400M-16-SigLIP2-384" Embedding model (SigLIP 2 NaFlex for 16gb/24gb)
clip.pretrained "webli" Pre-trained weights
clip.embedding_dim 1152 Embedding dimensions (1152 for SigLIP 2, 768 for ViT-L-14)
clip.backend "transformers" Backend library: "transformers" (SigLIP 2 NaFlex, native aspect ratio) or "open_clip" (legacy)
clip_legacy.model_name "ViT-L-14" Legacy CLIP model (for legacy/8gb profiles)
clip_legacy.pretrained "laion2b_s32b_b82k" Legacy pre-trained weights
florence_2_large.model_path "MiaoshouAI/Florence-2-large-PromptGen-v2.0" Florence-2 PromptGen model for tagging
florence_2_large.vlm_batch_size 4 Images per Florence-2 inference batch
supplementary_pyiqa ["topiq_iaa", "topiq_nr_face", "liqe"] Additional PyIQA models to run
saliency.enabled false Enable BiRefNet_dynamic subject saliency
samp_net.input_size 384 Image size for inference

VRAM Auto-Detection

When vram_profile is set to "auto" (default), the system:

  1. Detects available GPU VRAM at startup
  2. Selects the best profile that fits
  3. Logs the selected profile
Detected VRAM Selected Profile
≥ 20GB 24gb
≥ 14GB 16gb
≥ 6GB 8gb
No GPU legacy (uses system RAM)

Quality Assessment Models

Controls which model assesses image quality/aesthetics. Uses the pyiqa library for state-of-the-art models.

{
  "quality": {
    "model": "auto"
  }
}
Setting Default Description
model "auto" Quality model: auto, topiq, hyperiqa, dbcnn, musiq, clip-mlp

Available Quality Models

Model SRCC VRAM Speed Best For
topiq 0.93 ~2GB Fast Best accuracy, recommended default
hyperiqa 0.90 ~2GB Fast Efficient alternative to TOPIQ
dbcnn 0.90 ~2GB Fast Dual-branch CNN, good accuracy
musiq 0.87 ~2GB Fast Multi-scale, handles any resolution
clipiqa+ 0.86 ~4GB Fast CLIP with learned quality prompts
clip-mlp 0.76 ~4GB Fast Legacy fallback

SRCC = Spearman Rank Correlation Coefficient on KonIQ-10k benchmark. Higher is better (1.0 = perfect).

Model Comparison

TOPIQ (Recommended)

  • Architecture: ResNet50 backbone with top-down attention
  • Accuracy: Best on KonIQ-10k benchmark (0.93 SRCC)
  • VRAM: ~2GB - runs on any modern GPU
  • Speed: ~10ms per image
  • Strengths: Excellent accuracy/efficiency ratio, focuses on semantically important distortions
  • Weaknesses: No text explanations

HyperIQA

  • Architecture: Hyper-network predicting quality weights
  • Accuracy: 0.90 SRCC on KonIQ-10k
  • VRAM: ~2GB
  • Speed: ~8ms per image
  • Strengths: Very efficient, content-adaptive
  • Weaknesses: Slightly lower accuracy than TOPIQ

DBCNN

  • Architecture: Dual-branch CNN (synthetic + authentic distortions)
  • Accuracy: 0.90 SRCC on KonIQ-10k
  • VRAM: ~2GB
  • Speed: ~10ms per image
  • Strengths: Good on both synthetic and real-world distortions
  • Weaknesses: Two-branch design slightly slower

MUSIQ

  • Architecture: Multi-scale Transformer (Google)
  • Accuracy: 0.87 SRCC on KonIQ-10k
  • VRAM: ~2GB
  • Speed: ~15ms per image
  • Strengths: Handles any resolution without resizing, multi-scale analysis
  • Weaknesses: Slightly lower accuracy, transformer overhead

CLIP-MLP (Legacy)

  • Architecture: CLIP ViT-L-14 + trained MLP head
  • Accuracy: ~0.76 SRCC
  • VRAM: ~4GB
  • Speed: ~5ms per image
  • Strengths: Fast, uses existing CLIP model
  • Weaknesses: Lower accuracy than specialized IQA models

Auto-Selection Logic

When model: "auto":

use topiq (best accuracy, fits any VRAM >= 2GB)

Switching Quality Models

  1. Edit scoring_config.json:

    "quality": {
      "model": "topiq"
    }
  2. Re-score existing photos (optional):

    python facet.py /path --pass quality
    python facet.py --recompute-average

Processing

Unified processing settings for GPU batch processing and multi-pass mode.

{
  "processing": {
    "mode": "auto",
    "gpu_batch_size": 16,
    "ram_chunk_size": 32,
    "num_workers": 4,
    "auto_tuning": {
      "enabled": true,
      "monitor_interval_seconds": 5,
      "tuning_interval_images": 32,
      "min_processing_workers": 1,
      "max_processing_workers": 32,
      "min_gpu_batch_size": 2,
      "max_gpu_batch_size": 32,
      "min_ram_chunk_size": 10,
      "max_ram_chunk_size": 128,
      "memory_limit_percent": 85,
      "cpu_target_percent": 85,
      "metrics_print_interval_seconds": 30
    },
    "thumbnails": {
      "photo_size": 640,
      "photo_quality": 80,
      "face_padding_ratio": 0.3
    }
  }
}

Key Concepts

gpu_batch_size - How many images are processed together on the GPU in a single forward pass. Limited by VRAM. Auto-tuned: reduced when GPU memory exceeds limit.

ram_chunk_size - How many images are cached in RAM between model passes (multi-pass mode only). Reduces disk I/O by loading images once per chunk. Limited by system RAM. Auto-tuned: reduced when system memory exceeds limit.

Settings Reference

Setting Default Description
mode "auto" Processing mode: auto, multi-pass, single-pass
gpu_batch_size 16 Images per GPU batch (VRAM-limited)
ram_chunk_size 32 Images per RAM chunk (multi-pass)
num_workers 4 Image loader threads
auto_tuning
enabled true Enable auto-tuning
monitor_interval_seconds 5 Resource check interval
tuning_interval_images 32 Re-tune every N images
min_processing_workers 1 Minimum loader threads
max_processing_workers 32 Maximum loader threads
min_gpu_batch_size 2 Minimum GPU batch size
max_gpu_batch_size 32 Maximum GPU batch size
min_ram_chunk_size 10 Minimum RAM chunk size
max_ram_chunk_size 128 Maximum RAM chunk size
memory_limit_percent 85 System memory usage limit
cpu_target_percent 85 CPU usage target
metrics_print_interval_seconds 30 Stats print interval
thumbnails
photo_size 640 Stored thumbnail size (pixels)
photo_quality 80 Thumbnail JPEG quality
face_padding_ratio 0.3 Padding around face crops

Processing Modes

Mode Description
auto Automatically selects multi-pass or single-pass based on VRAM
multi-pass Sequential model loading (works with limited VRAM)
single-pass All models loaded at once (requires high VRAM)

How Multi-Pass Works

Instead of loading all models at once (~18GB VRAM), multi-pass:

  1. Loads images in RAM chunks (default: 100 images)
  2. For each chunk, runs models sequentially:
    • Load model → process chunk → unload model
  3. Combines results in final aggregation pass

Benefits:

  • Use high-quality models (Qwen2.5-VL) even with limited VRAM
  • Each image loaded only once per chunk
  • Automatic pass grouping optimizes for available VRAM

Auto-Tuning Behavior

The system monitors resource usage and automatically adjusts:

Metric Action
GPU memory > limit Reduce gpu_batch_size by 25%
System RAM > limit Reduce ram_chunk_size by 25%
System RAM < (limit - 20%) Increase ram_chunk_size by 25%
CPU > target Suggest fewer workers
Queue timeouts > 5% Suggest more workers

Dynamic Pass Grouping

When VRAM allows, multiple small models run together:

VRAM Pass 1 Pass 2 Pass 3
8GB CLIP + SAMP-Net + InsightFace TOPIQ -
12GB CLIP + SAMP-Net + InsightFace + TOPIQ - -
16GB CLIP + SAMP-Net + InsightFace + TOPIQ Qwen2.5-VL -
24GB+ All models together (single-pass) - -

CLI Options

# Default: auto multi-pass with optimal grouping
python facet.py /path/to/photos

# Force single-pass (all models loaded at once)
python facet.py /path --single-pass

# Run specific pass only
python facet.py /path --pass quality       # TOPIQ only
python facet.py /path --pass quality-iaa   # TOPIQ IAA (aesthetic merit)
python facet.py /path --pass quality-face  # TOPIQ NR-Face
python facet.py /path --pass quality-liqe  # LIQE (quality + distortion)
python facet.py /path --pass tags          # Configured tagger only
python facet.py /path --pass composition   # SAMP-Net only
python facet.py /path --pass faces         # InsightFace only
python facet.py /path --pass embeddings    # CLIP/SigLIP embeddings only
python facet.py /path --pass saliency      # BiRefNet subject saliency

# List available models
python facet.py --list-models

Burst Detection

Groups similar photos taken in quick succession.

{
  "burst_detection": {
    "similarity_threshold_percent": 70,
    "time_window_minutes": 0.8,
    "rapid_burst_seconds": 0.4
  }
}
Setting Default Description
similarity_threshold_percent 70 Image hash similarity threshold
time_window_minutes 0.8 Maximum time between photos
rapid_burst_seconds 0.4 Photos within this auto-grouped

Burst Scoring

Weights used by burst culling to compute a composite score for selecting the best shot within each burst group. Weights should sum to 1.0.

{
  "burst_scoring": {
    "weight_aggregate": 0.4,
    "weight_aesthetic": 0.25,
    "weight_sharpness": 0.2,
    "weight_blink": 0.15
  }
}
Setting Default Description
weight_aggregate 0.4 Weight of the overall aggregate score
weight_aesthetic 0.25 Weight of the aesthetic quality score
weight_sharpness 0.2 Weight of the technical sharpness score
weight_blink 0.15 Penalty weight for detected blinks (higher = stronger penalty)

Duplicate Detection

Detect duplicate photos globally using perceptual hash (pHash) comparison.

{
  "duplicate_detection": {
    "similarity_threshold_percent": 90
  }
}
Setting Default Description
similarity_threshold_percent 90 pHash similarity threshold (90% = Hamming distance <= 6 of 64 bits)

Run python facet.py --detect-duplicates to detect and group duplicates.


Face Detection

InsightFace face detection settings.

{
  "face_detection": {
    "min_confidence_percent": 65,
    "min_face_size": 20,
    "blink_ear_threshold": 0.28,
    "min_faces_for_group": 4
  }
}
Setting Default Description
min_confidence_percent 65 Minimum detection confidence
min_face_size 20 Minimum face size in pixels
blink_ear_threshold 0.28 Eye Aspect Ratio for blink detection
min_faces_for_group 4 Minimum faces to classify as group portrait (recomputed on --recompute-average)

Face Clustering

HDBSCAN clustering for face recognition.

{
  "face_clustering": {
    "enabled": true,
    "min_faces_per_person": 2,
    "min_samples": 2,
    "auto_merge_distance_percent": 15,
    "clustering_algorithm": "best",
    "leaf_size": 40,
    "use_gpu": "auto",
    "merge_threshold": 0.6,
    "chunk_size": 10000
  }
}
Setting Default Description
enabled true Enable face clustering
min_faces_per_person 2 Minimum photos per person
min_samples 2 HDBSCAN min_samples parameter
auto_merge_distance_percent 15 Auto-merge within this distance
clustering_algorithm "best" HDBSCAN algorithm
leaf_size 40 Tree leaf size (CPU only)
use_gpu "auto" GPU mode: auto, always, never
merge_threshold 0.6 Centroid similarity for matching
chunk_size 10000 Processing chunk size

Clustering algorithms:

Algorithm Complexity Best For
boruvka_balltree O(n log n) High-dimensional data (recommended)
boruvka_kdtree O(n log n) Low-dimensional data
prims_balltree O(n²) Memory-constrained, high-dim
prims_kdtree O(n²) Memory-constrained, low-dim
best Auto Let HDBSCAN decide

Face Processing

Controls face extraction and thumbnail generation.

{
  "face_processing": {
    "crop_padding": 0.3,
    "use_db_thumbnails": true,
    "face_thumbnail_size": 640,
    "face_thumbnail_quality": 90,
    "extract_workers": 2,
    "extract_batch_size": 16,
    "refill_workers": 4,
    "refill_batch_size": 100,
    "auto_tuning": {
      "enabled": true,
      "memory_limit_percent": 80,
      "min_batch_size": 8,
      "monitor_interval_seconds": 5
    }
  }
}
Setting Default Description
crop_padding 0.3 Padding ratio for face crops
use_db_thumbnails true Use stored thumbnails
face_thumbnail_size 640 Thumbnail size in pixels
face_thumbnail_quality 90 JPEG quality
extract_workers 2 Parallel extraction workers
extract_batch_size 16 Extraction batch size
refill_workers 4 Thumbnail refill workers
refill_batch_size 100 Refill batch size
auto_tuning
enabled true Enable memory-based tuning
memory_limit_percent 80 Memory usage limit
min_batch_size 8 Minimum batch size
monitor_interval_seconds 5 Check interval

Monochrome Detection

Black & white photo detection.

{
  "monochrome_detection": {
    "saturation_threshold_percent": 5
  }
}
Setting Default Description
saturation_threshold_percent 5 Mean saturation < 5% = monochrome

Tagging

General tagging settings. The tagging model is configured per-profile in models.profiles.*.tagging_model.

{
  "tagging": {
    "enabled": true,
    "max_tags": 5
  }
}
Setting Default Description
enabled true Enable tagging
max_tags 5 Maximum tags per photo

Note: CLIP-specific settings like similarity_threshold_percent are in the models.clip section.

Available Tagging Models

Configured via models.profiles.*.tagging_model:

Model VRAM Speed Tag Style Pros Cons
clip 0 (reuses embeddings) Instant (~5ms) Mood/atmosphere (dramatic, golden_hour, vintage) No extra model load; captures lighting and mood well Less literal object detection
qwen3-vl-2b ~4GB Moderate (~100ms) Structured scenes (landscape, architecture, reflection) Best semantic understanding for size; accurate scene classification Requires transformers + extra VRAM
qwen2.5-vl-7b ~16GB Slow (~200ms) Detailed scenes with nuance Most capable VLM; handles complex/ambiguous scenes High VRAM; slower inference
florence-2 ~2GB Fast (~50ms) Literal objects (sky, water, building) Fast inference Over-tags generic terms; caption-based matching is fragile; deprecated in favor of CLIP

Default Tagging Models per Profile

Profile Tagging Model Embedding Model
legacy clip CLIP ViT-L-14 (768-dim)
8gb clip CLIP ViT-L-14 (768-dim)
16gb qwen3-vl-2b SigLIP 2 NaFlex SO400M (1152-dim)
24gb qwen2.5-vl-7b SigLIP 2 NaFlex SO400M (1152-dim)

Re-tagging Photos

python facet.py --recompute-tags       # Re-tag using configured model per profile
python facet.py --recompute-tags-vlm   # Re-tag using VLM tagger

Standalone Tags

Tags with synonym lists that are not tied to any specific category. These are available for all photos regardless of category assignment. Each key is the tag name; the value is a list of synonyms for CLIP/VLM matching.

{
  "standalone_tags": {
    "bokeh": ["bokeh", "shallow depth of field", "background blur", "out of focus"],
    "surreal": ["surreal", "dreamlike", "fantasy", "composite", "double exposure"],
    "flat_lay": ["flat lay", "overhead shot", "top down", "bird's eye product"],
    "golden_hour": ["golden hour", "magic hour", "warm light", "sunset light"],
    "portrait_tag": ["portrait", "headshot", "face portrait", "close-up portrait"]
  }
}

Add new standalone tags by providing a key and a list of synonyms. Tags defined here are merged with category-specific tags to form the full tag vocabulary.


Analysis

Thresholds for --compute-recommendations.

{
  "analysis": {
    "aesthetic_max_threshold": 9.0,
    "aesthetic_target": 9.5,
    "quality_avg_threshold": 7.5,
    "quality_weight_threshold_percent": 10,
    "correlation_dominant_threshold": 0.5,
    "category_min_samples": 50,
    "category_imbalance_threshold": 0.5,
    "score_clustering_std_threshold": 1.0,
    "top_score_threshold": 8.5,
    "exposure_avg_threshold": 8.0
  }
}
Setting Default Description
aesthetic_max_threshold 9.0 Warn if max aesthetic below this
aesthetic_target 9.5 Target for aesthetic_scale
quality_avg_threshold 7.5 "High value" quality threshold
quality_weight_threshold_percent 10 Warn if quality weight ≤ this
correlation_dominant_threshold 0.5 "Dominant signal" warning
category_min_samples 50 Minimum photos per category
category_imbalance_threshold 0.5 Score gap warning
score_clustering_std_threshold 1.0 Warn if std dev < this
top_score_threshold 8.5 Warn if max aggregate < this
exposure_avg_threshold 8.0 Warn if avg exposure > this

Viewer

Web gallery display and behavior.

{
  "viewer": {
    "default_category": "",
    "edition_password": "",
    "comparison_mode": {
      "min_comparisons_for_optimization": 50,
      "pair_selection_strategy": "uncertainty",
      "show_current_scores": true
    },
    "sort_options": { ... },
    "pagination": {
      "default_per_page": 64
    },
    "dropdowns": {
      "max_cameras": 50,
      "max_lenses": 50,
      "max_persons": 50,
      "max_tags": 20,
      "min_photos_for_person": 10
    },
    "raw_processor": {
      "backend": "rawpy",
      "darktable": {
        "executable": "darktable-cli",
        "hq": true,
        "width": null,
        "height": null,
        "extra_args": []
      }
    },
    "display": {
      "tags_per_photo": 4,
      "card_width_px": 168,
      "image_width_px": 160,
      "image_jpeg_quality": 96,
      "thumbnail_slider": {
        "min_px": 120,
        "max_px": 400,
        "default_px": 168,
        "step_px": 8
      }
    },
    "face_thumbnails": {
      "output_size_px": 64,
      "jpeg_quality": 80,
      "crop_padding_ratio": 0.2,
      "min_crop_size_px": 20
    },
    "quality_thresholds": {
      "good": 6,
      "great": 7,
      "excellent": 8,
      "best": 9
    },
    "photo_types": {
      "top_picks_min_score": 7,
      "top_picks_min_face_ratio": 0.2,
      "top_picks_weights": {
        "aggregate_percent": 30,
        "aesthetic_percent": 28,
        "composition_percent": 18,
        "face_quality_percent": 24
      },
      "low_light_max_luminance": 0.2
    },
    "defaults": {
      "hide_blinks": true,
      "hide_bursts": true,
      "hide_duplicates": true,
      "hide_details": true,
      "hide_tooltip": false,
      "hide_rejected": true,
      "sort": "aggregate",
      "sort_direction": "DESC",
      "type": "",
      "gallery_mode": "mosaic"
    },
    "cache_ttl_seconds": 60,
    "notification_duration_ms": 2000,
    "path_mapping": {}
  }
}
Setting Default Description
default_category "" Default category filter
edition_password "" Password to unlock edition mode (empty = disabled)
comparison_mode
min_comparisons_for_optimization 50 Minimum for optimization
pair_selection_strategy "uncertainty" Default strategy
show_current_scores true Show scores during comparison
pagination
default_per_page 64 Photos per page
dropdowns
max_cameras 50 Max cameras in dropdown
max_lenses 50 Max lenses
max_persons 50 Max persons
max_tags 20 Max tags
min_photos_for_person 10 Hide persons with fewer photos from dropdown
raw_processor
darktable.executable "darktable-cli" darktable-cli binary name or absolute path
darktable.profiles [] Array of named darktable export profiles (see below)
darktable.profiles[].name (required) Profile display name (used in download menu and API profile param)
darktable.profiles[].hq true Pass --hq true for high-quality export
darktable.profiles[].width (omit) Max output width (omit for full resolution)
darktable.profiles[].height (omit) Max output height (omit for full resolution)
darktable.profiles[].extra_args [] Additional CLI arguments (e.g., ["--style", "monochrome"])
display
tags_per_photo 4 Tags shown on cards
card_width_px 168 Card width
image_width_px 160 Image width
image_jpeg_quality 96 JPEG quality for RAW/HEIF conversion in /api/download and /api/image (1–100)
thumbnail_slider.min_px 120 Minimum thumbnail size (px)
thumbnail_slider.max_px 400 Maximum thumbnail size (px)
thumbnail_slider.default_px 168 Default thumbnail size (px)
thumbnail_slider.step_px 8 Slider step increment (px)
face_thumbnails
output_size_px 64 Thumbnail size
jpeg_quality 80 JPEG quality
crop_padding_ratio 0.2 Face padding
min_crop_size_px 20 Minimum crop size
quality_thresholds
good 6 Good threshold
great 7 Great threshold
excellent 8 Excellent threshold
best 9 Best threshold
photo_types
top_picks_min_score 7 Top Picks minimum
top_picks_min_face_ratio 0.2 Face ratio for weights
low_light_max_luminance 0.2 Low light threshold
defaults
type "" Default photo type filter (e.g., "portraits", "landscapes", or "" for All)
sort "aggregate" Default sort column
sort_direction "DESC" Default sort direction ("ASC" or "DESC")
hide_blinks true Hide blink photos by default
hide_bursts true Show only best of burst by default
hide_duplicates true Hide non-lead duplicate photos by default
hide_details true Hide photo details on cards by default
hide_tooltip false Hide hover tooltip on cards by default
hide_rejected true Hide rejected photos by default
gallery_mode "mosaic" Default gallery layout ("grid" or "mosaic")
allowed_origins
allowed_origins ["http://localhost:4200", "http://localhost:5000"] CORS allowed origins for the FastAPI server. Add your domain or reverse proxy URL when hosting remotely.
Other
cache_ttl_seconds 60 Query cache TTL
notification_duration_ms 2000 Toast duration

Features

Toggle optional features to reduce memory usage or simplify the UI:

{
  "viewer": {
    "features": {
      "show_similar_button": true,
      "show_merge_suggestions": true,
      "show_rating_controls": true,
      "show_rating_badge": true,
      "show_memories": true,
      "show_captions": true,
      "show_timeline": true,
      "show_map": true
    }
  }
}
Setting Default Description
show_similar_button true Show "Find Similar" button on photo cards (uses numpy for CLIP similarity)
show_merge_suggestions true Enable merge suggestions feature on manage persons page
show_rating_controls true Show star rating and favorite controls
show_rating_badge true Show rating badge on photo cards
show_scan_button false Show scan trigger button for superadmin users (requires GPU on viewer host)
show_semantic_search true Show semantic search bar (text-to-image search using CLIP/SigLIP embeddings)
show_albums true Show albums feature (create, manage, and browse photo albums)
show_critique true Show AI critique button on photo cards (rule-based score breakdown)
show_vlm_critique false Enable VLM-powered critique mode (requires 16gb/24gb VRAM profile)
show_memories true Show "On This Day" memories dialog (photos taken on the same date in previous years)
show_captions true Show AI-generated captions on photo cards
show_timeline true Show timeline view for chronological browsing with date navigation
show_map false Show map view with GPS-based photo locations (requires Leaflet; off by default since photos may lack GPS data)

Memory optimization: Setting show_similar_button: false prevents numpy from being loaded, reducing viewer memory footprint. The similar photos feature computes CLIP embedding cosine similarity which requires numpy.

Path Mapping

Map database paths to local filesystem paths. Useful when photos were scored on one machine (e.g., Windows with UNC paths) but the viewer runs on another (e.g., Linux NAS with mount points).

{
  "viewer": {
    "path_mapping": {
      "\\\\NAS\\Photos": "/mnt/photos",
      "D:\\Pictures": "/volume1/pictures"
    }
  }
}
Setting Default Description
path_mapping {} Dict of source prefix to destination prefix. When serving full-size images or VLM critique, database paths starting with a source prefix are rewritten to use the destination prefix.

How it works:

  • Only applies when reading files from disk (full-size image serving, file downloads, VLM critique). Database paths are never modified.
  • Backslash/forward slash normalization is handled automatically: \\NAS\Photos\img.jpg and //NAS/Photos/img.jpg both match.
  • Mappings are evaluated in order; the first matching prefix wins.
  • Path mapping targets are automatically included in the scan directory allowlist for multi-user security checks.

Example: A database populated on Windows stores paths like \\NAS\Photos\2024\IMG_001.jpg. On Linux, the same share is mounted at /mnt/nas/Photos. Configure:

"path_mapping": {"\\\\NAS\\Photos": "/mnt/nas/Photos"}

Password Protection

Optional password protection for the viewer:

{
  "viewer": {
    "password": "your-password-here"
  }
}

When set, users must authenticate before accessing the viewer.

Viewer Performance

Override global performance settings when running the viewer. Useful for low-memory NAS deployment where scoring needs high resources but the viewer doesn't.

{
  "viewer": {
    "performance": {
      "mmap_size_mb": 0,
      "cache_size_mb": 4,
      "pool_size": 2,
      "thumbnail_cache_size": 200,
      "face_cache_size": 50
    }
  }
}
Setting Default Description
mmap_size_mb (global) SQLite mmap size override for viewer connections. 0 disables mmap.
cache_size_mb (global) SQLite cache size override for viewer connections
pool_size 5 Connection pool size (reduce for low-memory systems)
thumbnail_cache_size 2000 Max entries in the in-memory thumbnail resize cache
face_cache_size 500 Max entries in the in-memory face thumbnail cache

When not set, the viewer uses the global performance values. See Deployment for recommended NAS settings.


Performance

Database performance settings.

{
  "performance": {
    "mmap_size_mb": 12288,
    "cache_size_mb": 64
  }
}
Setting Default Description
mmap_size_mb 12288 SQLite memory-mapped I/O size
cache_size_mb 64 SQLite cache size

Storage

Controls where thumbnails and embeddings are stored. By default, they are stored as BLOB columns in the SQLite database. Filesystem mode stores them as files on disk instead, which can reduce database size and simplify backups.

{
  "storage": {
    "mode": "database",
    "filesystem_path": "./storage"
  }
}
Setting Default Description
mode "database" Storage backend: "database" (SQLite BLOBs) or "filesystem" (files on disk)
filesystem_path "./storage" Base directory for filesystem mode. Thumbnails are stored in <path>/thumbnails/ and embeddings in <path>/embeddings/, organized into subdirectories by content hash.

Filesystem mode details:

  • Files are organized by SHA-256 hash of the photo path, with two-character subdirectories to avoid too many files in one directory (e.g., thumbnails/a3/a3f8..._640.jpg).
  • Deleting a photo removes all associated thumbnail sizes and embedding files.
  • The directory is created automatically on first use.

Plugins

Event-driven plugin system for reacting to scoring events. Plugins can be Python modules, webhooks, or built-in actions.

Configuration

{
  "plugins": {
    "enabled": true,
    "high_score_threshold": 8.0,
    "webhooks": [
      {
        "url": "https://example.com/hook",
        "events": ["on_score_complete", "on_high_score"],
        "min_score": 8.0
      }
    ],
    "actions": {
      "copy_high_scores": {
        "event": "on_high_score",
        "action": "copy_to_folder",
        "folder": "/path/to/best-photos",
        "min_score": 9.0
      }
    }
  }
}
Key Default Description
enabled false Master switch — when false, no events are emitted
high_score_threshold 8.0 Minimum aggregate score to trigger on_high_score events
webhooks [] List of webhook endpoints to receive JSON POST payloads
actions {} Named built-in actions triggered by events

Supported Events

Event Trigger Payload
on_score_complete After each photo is scored path, filename, aggregate, aesthetic, comp_score, category, tags
on_new_photo When a photo enters the database Same as on_score_complete
on_high_score When aggregate ≥ high_score_threshold Same as on_score_complete
on_burst_detected When a burst group is identified burst_group_id, photo_count, best_path, paths

Writing a Plugin

Place a .py file in the plugins/ directory. Define functions named after the events you want to handle:

def on_score_complete(data: dict) -> None:
    print(f"Scored: {data['path']}{data['aggregate']:.1f}")

def on_high_score(data: dict) -> None:
    print(f"High score! {data['path']}{data['aggregate']:.1f}")

See plugins/example_plugin.py.example for the full interface.

Webhooks

Each webhook receives a JSON POST with SSRF protection (private/loopback addresses are blocked):

{
  "event": "on_high_score",
  "data": {
    "path": "/photos/IMG_001.jpg",
    "aggregate": 9.2,
    "aesthetic": 9.5,
    "comp_score": 8.8,
    "category": "portrait",
    "tags": "person, outdoor"
  }
}

Webhook options: url (required), events (list of event names), min_score (minimum aggregate to trigger).

Built-in Actions

Action Description Options
copy_to_folder Copy photo to a folder folder, min_score
send_notification Log a notification min_score

API Endpoints

Method Path Description
GET /api/plugins List loaded plugins, webhooks, and actions
POST /api/plugins/test-webhook Send a test payload to a webhook URL

Capsules

Curated photo diaporamas (slideshows) grouped by theme. Capsules are auto-generated from your photo library and cached with a configurable TTL.

{
  "capsules": {
    "min_aggregate": 6.0,
    "max_photos_per_capsule": 40,
    "max_photo_overlap": 0.2,
    "mmr_lambda": 0.5,
    "freshness_hours": 24,
    "reverse_geocoding": true,
    "journey": {
      "min_distance_km": 50,
      "min_photos": 8,
      "time_gap_hours": 24
    },
    "faces_of": { "min_photos": 10 },
    "seasonal": { "min_photos": 10 },
    "golden": { "percentile": 99, "max_photos": 50 },
    "color_story": { "embedding_threshold": 0.75, "min_group_size": 8, "max_groups": 5 },
    "this_week_years_ago": { "min_photos_per_year": 3 },
    "seeded": {
      "num_seeds": 20,
      "min_photos": 8,
      "seed_lifetime_minutes": 1440,
      "time_window_days": 7,
      "embedding_threshold": 0.7,
      "location_radius_km": 30
    },
    "progress": { "min_improvement_pct": 5, "min_photos": 10, "period_months": 3 },
    "color_palette": { "min_photos": 8 },
    "rare_pair": { "max_shared_photos": 5, "min_score": 7.0, "min_photos": 3 },
    "favorites": { "min_photos": 5 }
  }
}

Global Settings

Setting Default Description
min_aggregate 6.0 Minimum aggregate score for photos to be included in capsules
max_photos_per_capsule 40 Maximum photos per capsule (MMR diversity applied above 5)
max_photo_overlap 0.2 Maximum fraction of shared photos between two capsules before dedup removes one
mmr_lambda 0.5 MMR diversity weight: 0=maximize diversity, 1=maximize quality
freshness_hours 24 Cache TTL and rotation period for cover photos and seeded capsules
reverse_geocoding true Enable offline reverse geocoding for location/journey capsule titles (requires reverse_geocoder package)

Capsule Types

Type Description
journey Trips detected via GPS clustering + temporal gaps. Titles include destination name when geocoding is enabled.
faces_of Best photos of each recognized person
seasonal Photos grouped by season + year
golden Top 1% by aggregate score
color_story Visually similar groups via CLIP embedding clustering
this_week "This Week, Years Ago" — extended On This Day across ±3 days
location Geotagged photo clusters with reverse-geocoded place names
person_pair Pairs of named persons appearing together
seeded Seed-based discovery via time, similarity, person, tag, location, mood
progress "Your Photography is Improving" from quarterly score trends
color_palette "Color of the Month" from saturation/monochrome profiles
rare_pair Infrequent person pairs in high-scoring photos
favorites Favorited photos grouped by year and season

Dimension-Based Capsules

Automatically generated from database columns:

Dimension Groups By
year Year extracted from date_taken
month Year-month extracted from date_taken
week Year-week extracted from date_taken
camera Camera model
lens Lens model
tag Photo tags (requires photo_tags table)
day_of_week Day of week (Sunday–Saturday)
composition SAMP-Net composition pattern (rule_of_thirds, horizontal, etc.)
focal_range Focal length bins: ultra wide (<24mm), wide (24–35mm), standard (36–70mm), portrait (71–135mm), telephoto (136–300mm), super telephoto (300mm+)
category Photo content category (portrait, landscape, street, etc.)
time_of_day Time bins: golden morning, morning, midday, afternoon, golden evening, night
star_rating User star ratings (1–5 stars)

Cross-dimensional combos are also generated (e.g., camera × year, focal_range × category, category × year).

Slideshow Transitions

Each capsule type maps to a themed slide transition:

Transition Used By Effect
crossfade Default 300ms opacity swap
slide journey, location, this_week Slide in from right (500ms)
zoom faces_of, color_story Scale 1.05→1.0 with fade (400ms)
kenburns golden, seasonal, star_rating, favorites Slow zoom 1.0→1.08 over slide duration

Reverse Geocoding

Location and journey capsules use offline reverse geocoding via the reverse_geocoder package (local GeoNames dataset, ~30MB, no API calls). Results are cached in the location_names database table at 0.1° grid resolution (~11km).

Install: pip install reverse_geocoder

Set "reverse_geocoding": false to disable and fall back to coordinate display.

Similarity Groups

Settings for the AI similar-photo culling feature, which groups visually similar photos using CLIP/SigLIP embeddings:

{
  "similarity_groups": {
    "default_threshold": 0.85,
    "min_group_size": 2,
    "max_photos": 10000,
    "max_group_size": 50
  }
}
Setting Default Description
default_threshold 0.85 Minimum cosine similarity (0.0–1.0) to consider two photos as visually similar. Lower values produce larger groups but with less visual similarity.
min_group_size 2 Minimum number of photos required to form a similarity group
max_photos 10000 Maximum photos to load for similarity computation (O(n²) cost). Increase for larger libraries at the expense of computation time.
max_group_size 50 Maximum photos per similarity group. Larger groups are split to keep the UI usable.

Timeline

Settings for the chronological timeline view:

{
  "timeline": {
    "photos_per_group": 30
  }
}
Setting Default Description
photos_per_group 30 Number of photos loaded per date group in the timeline view. Higher values show more photos per date but increase page weight.

Map

Settings for the interactive map view:

{
  "map": {
    "cluster_zoom_threshold": 10
  }
}
Setting Default Description
cluster_zoom_threshold 10 Zoom level at which individual markers replace clusters. Lower values show individual markers earlier (more detail at wider zoom). Range: 1 (world) to 18 (street).

Translation

Settings for AI caption translation via MarianMT:

{
  "translation": {
    "target_language": "fr"
  }
}
Setting Default Description
target_language "fr" Target language code for --translate-captions. Supported: fr (French), de (German), es (Spanish), it (Italian). Uses Helsinki-NLP MarianMT models (CPU, no GPU required).

Share Secret

Auto-generated 64-character hex string for session/sharing tokens:

{
  "share_secret": "31a1c944ea5c82b871e61e50e5920daa2d1940b126c395f519088506595fd925"
}

Generated automatically on first run if not present.