Training Models

This guide explains how to train Video Seal models from scratch, including data preparation, image pre-training, and video fine-tuning.

Data preparation

You only need a folder of images to start training. Create a simple YAML configuration file in configs/datasets/ to point to your image/video directory.

Example dataset config:

# configs/datasets/myimages.yaml
train_dir: /path/to/images/train/
val_dir: /path/to/images/val/
train_annotation_file: null
val_annotation_file: null

The image data loader supports both simple image folders and COCO-format annotations (optional).

Training commands

Image pre-training

To train an image watermarking model (128 bits) from scratch:

OMP_NUM_THREADS=40 torchrun --nproc_per_node=2 train.py --local_rank 0 \
    --video_dataset none --image_dataset myimages --workers 8 \
    --extractor_model convnext_tiny --embedder_model unet_small2_yuv_quant --hidden_size_multiplier 1 --nbits 128 \
    --scaling_w_schedule Cosine,scaling_min=0.2,start_epoch=200,epochs=200 --scaling_w 1.0 --scaling_i 1.0 --attenuation jnd_1_1 \
    --epochs 601 --iter_per_epoch 1000 --scheduler CosineLRScheduler,lr_min=1e-6,t_initial=601,warmup_lr_init=1e-8,warmup_t=20 --optimizer AdamW,lr=5e-4 \
    --lambda_dec 1.0 --lambda_d 0.1 --lambda_i 0.1 --perceptual_loss yuv  --num_augs 2 --augmentation_config configs/all_augs_v3.yaml --disc_in_channels 1 --disc_start 50

For a 256-bit model, simply change --nbits 128 to --nbits 256.

Video fine-tuning

After pre-training on images, you can fine-tune on video data:

OMP_NUM_THREADS=40 torchrun --nproc_per_node=2 train.py --local_rank 0 \
    --video_dataset myvideos --image_dataset none --workers 0 --frames_per_clip 16 \
    --resume_from /path/to/image/checkpoint.pth --resume_optimizer_state True --resume_disc True \
    --videoseal_step_size 4 --lowres_attenuation True --img_size_proc 256 --img_size_val 768 --img_size 768 \
    --extractor_model convnext_tiny --embedder_model unet_small2_yuv_quant --hidden_size_multiplier 1 --nbits 128 \
    --scaling_w_schedule None --scaling_w 0.2 --scaling_i 1.0 --attenuation jnd_1_1 \
    --epochs 601 --iter_per_epoch 100 --scheduler None --optimizer AdamW,lr=1e-5 \
    --lambda_dec 1.0 --lambda_d 0.5 --lambda_i 0.1 --perceptual_loss yuv  --num_augs 2 --augmentation_config configs/all_augs_v3.yaml --disc_in_channels 1 --disc_start 50

Important parameters

--nbits: Number of bits in the watermark (128, 256)
--scaling_w: Watermark strength (higher values = more visible but more robust)

Pre-trained models

Full models

Image models

The image models are trained with these parameters: https://dl.fbaipublicfiles.com/videoseal/train_img_y.json. Here are the final weights with discriminator and optimizer state at the end of training, and the saved logs:

Model	Description	Training Checkpoint	Logs
128-bit	Image-trained model with 128 bits	y_128b_img.pth	logs
256-bit	Image-trained model with 256 bits	y_256b_img.pth	logs

Note: Inference-only model files (linked in the main README) are smaller versions of these checkpoints with only the necessary weights for inference.

Training Tips

Make sure that training kicks off (bit accuracy should increase fast). If not, try in this order: to remove perceptual loss (set --lambda_i 0), to increase --scaling_w, to remove the augmentations.
Adjust --scaling_w during training with scaling_w_schedule for better robustness (start high, then decrease)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Models

Data preparation

Training commands

Image pre-training

Video fine-tuning

Important parameters

Pre-trained models

Full models

Training Tips

FilesExpand file tree

training.md

Latest commit

History

training.md

File metadata and controls

Training Models

Data preparation

Training commands

Image pre-training

Video fine-tuning

Important parameters

Pre-trained models

Full models

Training Tips