A privacy-preserving masking and redaction pipeline built around SAM3. Detects and redacts sensitive visual content (faces, plates, IDs, screens) with a two-phase design: heavy distillation runs offline once, lightweight inference runs at runtime.
# 1. Install
cd privacy-sam3-distill
bash scripts/install_all.sh
source .venv/bin/activate
bash scripts/download_models.sh
# 2. Preprocess raw data
psd preprocess \
--input data/raw \
--output-dir data/curated \
--train-ratio 0.7 \
--val-ratio 0.15 \
--frame-step 10 \
--max-frames-per-video 300 \
--seed 42
# 3. Train (one-shot wrapper)
PROMPT=faces \
RAW_DATA=data/raw \
CURATED=data/curated \
SAM3_CKPT=models/sam3/sam3.pt \
LLM_PATH=models/llm/mistral-7b-instruct-v0.2.Q4_K_M.gguf \
bash scripts/run_training.sh
# 4. Run inference
python scripts/sam3_distilled_adapter.py infer \
--input data/sample_videos/test.mp4 \
--runtime-model proposed_distilled \
--adapter checkpoints/faces_adapter.json \
--prompt "faces" \
--output-mode inpaint \
--output output/redacted.mp4 \
--mask-preview output/mask.mp4 \
--mode balanced \
--device cuda \
--sam3-checkpoint models/sam3/sam3.pt- Python 3.10+
git- Linux build tools (for some wheels / native deps)
cd privacy-sam3-distill
bash scripts/install_all.sh
source .venv/bin/activateWhat this script does:
- creates
.venv - installs project deps
- installs local
extern/sam3 - installs
einops(required by current SAM3 imports)
bash scripts/download_models.shDownloads:
- SAM3 checkpoint(s) into
models/sam3/ - offline Mistral GGUF into
models/llm/
pip check
python scripts/sam3_distilled_adapter.py --helpThe pipeline has two phases:
- Offline distillation — build a teacher cache from SAM3 consensus masks and fit a lightweight adapter. Runs once per prompt/dataset.
- Runtime inference — single SAM3 pass through the distilled student or plain SAM3 baseline.
OFFLINE (DISTILLATION)
----------------------
Raw images/videos
|
v
+--------------------+
| psd preprocess |
| -> curated splits |
+--------------------+
|
v
+-----------------------------------------------+
| teacher-cache |
| - SAM3 prompt variants |
| - Optional offline LLM (llama_cpp) expansion |
| - Consensus mask + uncertainty map |
+-----------------------------------------------+
|
v
+------------------------------+
| fit-adapter |
| - mask head |
| - uncertainty head |
| - refinement policy params |
+------------------------------+
|
v
checkpoints/<prompt>_adapter.json
RUNTIME (INFERENCE)
-------------------
Input image/video
|
+------------------------------+
| |
v v
+-------------------------+ +-----------------------+
| proposed_distilled | | sam3_single_pass |
| - SAM3 base mask | | - single SAM3 pass |
| - adapter mask + unc | | - no adapter |
| - local refine (budget) | | - no student refine |
+-------------------------+ +-----------------------+
|
v
+------------------------------+
| output-mode |
| inpaint | redact | mask | det|
+------------------------------+
|
v
redacted/inpainted output
Place source media under any folder:
data/raw/
image1.jpg
image2.png
clip1.mp4
...
psd preprocess \
--input data/raw \
--output-dir data/curated \
--train-ratio 0.7 \
--val-ratio 0.15 \
--frame-step 10 \
--max-frames-per-video 300 \
--seed 42data/curated/
images/
train/*.jpg
val/*.jpg
test/*.jpg
manifest.csv
Constraints on ratios:
train_ratiomust be in(0, 1)val_ratiomust be in[0, 1)train_ratio + val_ratiomust be< 1
For very small datasets, some splits may be empty — check manifest.csv.
Two ways to train: one-shot wrapper (simplest), or step-by-step.
PROMPT=faces \
RAW_DATA=data/raw \
CURATED=data/curated \
SAM3_CKPT=models/sam3/sam3.pt \
LLM_PATH=models/llm/mistral-7b-instruct-v0.2.Q4_K_M.gguf \
bash scripts/run_training.shOutput: checkpoints/faces_adapter.json (named after PROMPT).
Step 1 — Build teacher cache:
python scripts/sam3_distilled_adapter.py teacher-cache \
--input data/curated/images/train \
--prompt "faces" \
--cache-dir artifacts/cache_faces \
--mode balanced \
--device cuda \
--sam3-checkpoint models/sam3/sam3.pt \
--consensus-variants 7 \
--consensus-risk 0.6 \
--teacher-use-llm \
--llm-provider llama_cpp \
--llm-model-path models/llm/mistral-7b-instruct-v0.2.Q4_K_M.gguf \
--save-previewTo run without the offline LLM, drop --teacher-use-llm, --llm-provider, and --llm-model-path.
Step 2 — Fit adapter:
python scripts/sam3_distilled_adapter.py fit-adapter \
--cache-dir artifacts/cache_faces \
--adapter-out checkpoints/faces_adapter.json \
--prompt "faces" \
--max-pixels-per-sample 20000 \
--ridge 1e-3 \
--loss-uncertainty-gain 1.5 \
--loss-boundary-gain 1.0 \
--refine-policy budgeted \
--refine-budget-ratio 0.12 \
--refine-q-min 0.55 \
--refine-q-max 0.95 \
--refine-q-steps 9 \
--refine-cost-weight 0.35infer supports two runtime models:
| Model | Description |
|---|---|
proposed_distilled |
SAM3 + adapter + uncertainty-guided local refinement |
sam3_single_pass |
Single SAM3 pass, no adapter, no refinement — fast baseline |
Image:
python scripts/sam3_distilled_adapter.py infer \
--input test.webp \
--runtime-model proposed_distilled \
--adapter checkpoints/faces_adapter.json \
--prompt "faces" \
--output-mode inpaint \
--output output/test_proposed_inpaint.jpg \
--mask-preview output/test_proposed_mask.jpg \
--mode balanced \
--device cuda \
--sam3-checkpoint models/sam3/sam3.ptVideo:
python scripts/sam3_distilled_adapter.py infer \
--input data/sample_videos/test.mp4 \
--runtime-model proposed_distilled \
--adapter checkpoints/faces_adapter.json \
--prompt "faces" \
--output-mode inpaint \
--output output/test_redacted_faces_distilled.mp4 \
--mask-preview output/test_mask_faces_distilled.mp4 \
--mode balanced \
--device cuda \
--sam3-checkpoint models/sam3/sam3.pt \
--video-log-every 10 \
--metrics-csv artifacts/runtime_metrics.csvOne SAM3 prompt pass with no adapter or refinement. Use for fast baselines or adapter-free deployment.
Image:
python scripts/sam3_distilled_adapter.py infer \
--input test.webp \
--runtime-model sam3_single_pass \
--prompt "faces" \
--output-mode inpaint \
--output output/test_sam3_single_pass_inpaint.jpg \
--mask-preview output/test_sam3_single_pass_mask.jpg \
--mode balanced \
--device cuda \
--sam3-checkpoint models/sam3/sam3.ptVideo:
python scripts/sam3_distilled_adapter.py infer \
--input data/sample_videos/test.mp4 \
--runtime-model sam3_single_pass \
--prompt "faces" \
--output-mode inpaint \
--output output/test_sam3_single_pass_inpaint.mp4 \
--mask-preview output/test_sam3_single_pass_mask.mp4 \
--mode balanced \
--device cuda \
--sam3-checkpoint models/sam3/sam3.pt \
--video-log-every 10Run unit tests:
pytest -qMinimal smoke test on a single image (verify the full pipeline end-to-end):
python scripts/sam3_distilled_adapter.py teacher-cache \
--input test.webp \
--prompt "faces" \
--cache-dir /tmp/psd_cache \
--mode fast \
--device cuda \
--sam3-checkpoint models/sam3/sam3.pt \
--consensus-variants 3 \
--consensus-risk 0.6 \
--teacher-use-llm \
--llm-provider llama_cpp \
--llm-model-path models/llm/mistral-7b-instruct-v0.2.Q4_K_M.gguf
python scripts/sam3_distilled_adapter.py fit-adapter \
--cache-dir /tmp/psd_cache \
--adapter-out /tmp/faces_adapter.json \
--prompt "faces"- Output extension mismatch — if input is an image, output must be an image path (
.jpg,.png,.webp), not.mp4. - Device errors — always use
--device cuda. Using--device cpuwill cause a device mismatch error at runtime. - Empty splits after preprocess — inspect
data/curated/manifest.csvand adjust split ratios or increase dataset size.