Skip to content

tahamsi/privacy-sam3-distill

Repository files navigation

Privacy-SAM3-Distill

A privacy-preserving masking and redaction pipeline built around SAM3. Detects and redacts sensitive visual content (faces, plates, IDs, screens) with a two-phase design: heavy distillation runs offline once, lightweight inference runs at runtime.

Quick Start

# 1. Install
cd privacy-sam3-distill
bash scripts/install_all.sh
source .venv/bin/activate
bash scripts/download_models.sh

# 2. Preprocess raw data
psd preprocess \
  --input data/raw \
  --output-dir data/curated \
  --train-ratio 0.7 \
  --val-ratio 0.15 \
  --frame-step 10 \
  --max-frames-per-video 300 \
  --seed 42

# 3. Train (one-shot wrapper)
PROMPT=faces \
RAW_DATA=data/raw \
CURATED=data/curated \
SAM3_CKPT=models/sam3/sam3.pt \
LLM_PATH=models/llm/mistral-7b-instruct-v0.2.Q4_K_M.gguf \
bash scripts/run_training.sh

# 4. Run inference
python scripts/sam3_distilled_adapter.py infer \
  --input data/sample_videos/test.mp4 \
  --runtime-model proposed_distilled \
  --adapter checkpoints/faces_adapter.json \
  --prompt "faces" \
  --output-mode inpaint \
  --output output/redacted.mp4 \
  --mask-preview output/mask.mp4 \
  --mode balanced \
  --device cuda \
  --sam3-checkpoint models/sam3/sam3.pt

1. Environment Setup

1.1 Prerequisites

  • Python 3.10+
  • git
  • Linux build tools (for some wheels / native deps)

1.2 Install

cd privacy-sam3-distill
bash scripts/install_all.sh
source .venv/bin/activate

What this script does:

  • creates .venv
  • installs project deps
  • installs local extern/sam3
  • installs einops (required by current SAM3 imports)

1.3 Download models

bash scripts/download_models.sh

Downloads:

  • SAM3 checkpoint(s) into models/sam3/
  • offline Mistral GGUF into models/llm/

1.4 Health check

pip check
python scripts/sam3_distilled_adapter.py --help

2. Architecture

The pipeline has two phases:

  • Offline distillation — build a teacher cache from SAM3 consensus masks and fit a lightweight adapter. Runs once per prompt/dataset.
  • Runtime inference — single SAM3 pass through the distilled student or plain SAM3 baseline.
                    OFFLINE (DISTILLATION)
                    ----------------------
 Raw images/videos
        |
        v
 +--------------------+
 | psd preprocess     |
 | -> curated splits  |
 +--------------------+
        |
        v
 +-----------------------------------------------+
 | teacher-cache                                  |
 | - SAM3 prompt variants                         |
 | - Optional offline LLM (llama_cpp) expansion   |
 | - Consensus mask + uncertainty map              |
 +-----------------------------------------------+
        |
        v
 +------------------------------+
 | fit-adapter                  |
 | - mask head                  |
 | - uncertainty head           |
 | - refinement policy params   |
 +------------------------------+
        |
        v
  checkpoints/<prompt>_adapter.json


                    RUNTIME (INFERENCE)
                    -------------------
 Input image/video
        |
        +------------------------------+
        |                              |
        v                              v
 +-------------------------+   +-----------------------+
 | proposed_distilled      |   | sam3_single_pass      |
 | - SAM3 base mask        |   | - single SAM3 pass    |
 | - adapter mask + unc    |   | - no adapter          |
 | - local refine (budget) |   | - no student refine   |
 +-------------------------+   +-----------------------+
        |
        v
 +------------------------------+
 | output-mode                  |
 | inpaint | redact | mask | det|
 +------------------------------+
        |
        v
 redacted/inpainted output

3. Data Preprocessing

3.1 Input format

Place source media under any folder:

data/raw/
  image1.jpg
  image2.png
  clip1.mp4
  ...

3.2 Run preprocessing

psd preprocess \
  --input data/raw \
  --output-dir data/curated \
  --train-ratio 0.7 \
  --val-ratio 0.15 \
  --frame-step 10 \
  --max-frames-per-video 300 \
  --seed 42

3.3 Output format

data/curated/
  images/
    train/*.jpg
    val/*.jpg
    test/*.jpg
  manifest.csv

Constraints on ratios:

  • train_ratio must be in (0, 1)
  • val_ratio must be in [0, 1)
  • train_ratio + val_ratio must be < 1

For very small datasets, some splits may be empty — check manifest.csv.

4. Training

Two ways to train: one-shot wrapper (simplest), or step-by-step.

4.1 One-shot wrapper

PROMPT=faces \
RAW_DATA=data/raw \
CURATED=data/curated \
SAM3_CKPT=models/sam3/sam3.pt \
LLM_PATH=models/llm/mistral-7b-instruct-v0.2.Q4_K_M.gguf \
bash scripts/run_training.sh

Output: checkpoints/faces_adapter.json (named after PROMPT).

4.2 Step-by-step

Step 1 — Build teacher cache:

python scripts/sam3_distilled_adapter.py teacher-cache \
  --input data/curated/images/train \
  --prompt "faces" \
  --cache-dir artifacts/cache_faces \
  --mode balanced \
  --device cuda \
  --sam3-checkpoint models/sam3/sam3.pt \
  --consensus-variants 7 \
  --consensus-risk 0.6 \
  --teacher-use-llm \
  --llm-provider llama_cpp \
  --llm-model-path models/llm/mistral-7b-instruct-v0.2.Q4_K_M.gguf \
  --save-preview

To run without the offline LLM, drop --teacher-use-llm, --llm-provider, and --llm-model-path.

Step 2 — Fit adapter:

python scripts/sam3_distilled_adapter.py fit-adapter \
  --cache-dir artifacts/cache_faces \
  --adapter-out checkpoints/faces_adapter.json \
  --prompt "faces" \
  --max-pixels-per-sample 20000 \
  --ridge 1e-3 \
  --loss-uncertainty-gain 1.5 \
  --loss-boundary-gain 1.0 \
  --refine-policy budgeted \
  --refine-budget-ratio 0.12 \
  --refine-q-min 0.55 \
  --refine-q-max 0.95 \
  --refine-q-steps 9 \
  --refine-cost-weight 0.35

5. Inference

infer supports two runtime models:

Model Description
proposed_distilled SAM3 + adapter + uncertainty-guided local refinement
sam3_single_pass Single SAM3 pass, no adapter, no refinement — fast baseline

5.1 proposed_distilled

Image:

python scripts/sam3_distilled_adapter.py infer \
  --input test.webp \
  --runtime-model proposed_distilled \
  --adapter checkpoints/faces_adapter.json \
  --prompt "faces" \
  --output-mode inpaint \
  --output output/test_proposed_inpaint.jpg \
  --mask-preview output/test_proposed_mask.jpg \
  --mode balanced \
  --device cuda \
  --sam3-checkpoint models/sam3/sam3.pt

Video:

python scripts/sam3_distilled_adapter.py infer \
  --input data/sample_videos/test.mp4 \
  --runtime-model proposed_distilled \
  --adapter checkpoints/faces_adapter.json \
  --prompt "faces" \
  --output-mode inpaint \
  --output output/test_redacted_faces_distilled.mp4 \
  --mask-preview output/test_mask_faces_distilled.mp4 \
  --mode balanced \
  --device cuda \
  --sam3-checkpoint models/sam3/sam3.pt \
  --video-log-every 10 \
  --metrics-csv artifacts/runtime_metrics.csv

5.2 sam3_single_pass

One SAM3 prompt pass with no adapter or refinement. Use for fast baselines or adapter-free deployment.

Image:

python scripts/sam3_distilled_adapter.py infer \
  --input test.webp \
  --runtime-model sam3_single_pass \
  --prompt "faces" \
  --output-mode inpaint \
  --output output/test_sam3_single_pass_inpaint.jpg \
  --mask-preview output/test_sam3_single_pass_mask.jpg \
  --mode balanced \
  --device cuda \
  --sam3-checkpoint models/sam3/sam3.pt

Video:

python scripts/sam3_distilled_adapter.py infer \
  --input data/sample_videos/test.mp4 \
  --runtime-model sam3_single_pass \
  --prompt "faces" \
  --output-mode inpaint \
  --output output/test_sam3_single_pass_inpaint.mp4 \
  --mask-preview output/test_sam3_single_pass_mask.mp4 \
  --mode balanced \
  --device cuda \
  --sam3-checkpoint models/sam3/sam3.pt \
  --video-log-every 10

6. Testing

Run unit tests:

pytest -q

Minimal smoke test on a single image (verify the full pipeline end-to-end):

python scripts/sam3_distilled_adapter.py teacher-cache \
  --input test.webp \
  --prompt "faces" \
  --cache-dir /tmp/psd_cache \
  --mode fast \
  --device cuda \
  --sam3-checkpoint models/sam3/sam3.pt \
  --consensus-variants 3 \
  --consensus-risk 0.6 \
  --teacher-use-llm \
  --llm-provider llama_cpp \
  --llm-model-path models/llm/mistral-7b-instruct-v0.2.Q4_K_M.gguf

python scripts/sam3_distilled_adapter.py fit-adapter \
  --cache-dir /tmp/psd_cache \
  --adapter-out /tmp/faces_adapter.json \
  --prompt "faces"

7. Troubleshooting

  • Output extension mismatch — if input is an image, output must be an image path (.jpg, .png, .webp), not .mp4.
  • Device errors — always use --device cuda. Using --device cpu will cause a device mismatch error at runtime.
  • Empty splits after preprocess — inspect data/curated/manifest.csv and adjust split ratios or increase dataset size.

About

Privacy-preserving SAM3 distillation pipeline for fast image/video redaction and inpainting, with optional offline LLM prompt expansion.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors