Skip to content

Latest commit

 

History

History
94 lines (73 loc) · 3.43 KB

File metadata and controls

94 lines (73 loc) · 3.43 KB

GPU Pipeline — SynthID Bypass for Images & Video

Drop-in GPU port of the V3 spectral bypass. Uses PyTorch, so the same code runs on NVIDIA CUDA (Windows/Linux), Apple MPS (Mac M-series), and CPU — device is auto-selected.

1. Install (Windows + NVIDIA)

git clone <this-repo-url>
cd reverse-SynthID

python -m venv venv
venv\Scripts\activate

pip install -r requirements.txt

# Pick the CUDA build that matches your driver (check with: nvidia-smi)
# CUDA 12.1:
pip install torch --index-url https://download.pytorch.org/whl/cu121
# CUDA 11.8:
# pip install torch --index-url https://download.pytorch.org/whl/cu118

Verify GPU is visible:

python -c "import torch; print('cuda?', torch.cuda.is_available(), torch.cuda.get_device_name(0) if torch.cuda.is_available() else '')"

2. Image bypass (GPU)

python -c "
import cv2, numpy as np
from src.extraction.gpu.bypass_torch import bypass_single_image
img = cv2.imread('validation_images/test_4x3_city.png')[:, :, ::-1]  # BGR->RGB
out = bypass_single_image(img, 'artifacts/spectral_codebook_v3.npz', strength='aggressive')
cv2.imwrite('out.png', out[:, :, ::-1])
print('saved out.png')
"

The codebook has exact-match profiles only at 1024x1024 and 1536x2816. Other resolutions fall back to the CPU spatial path (still correct, slower).

3. Video bypass (GPU, batched)

python -m src.extraction.gpu.bypass_video `
    --input  my_video.mp4 `
    --output my_video_clean.mp4 `
    --codebook artifacts/spectral_codebook_v3.npz `
    --strength aggressive `
    --batch 8

Flags:

  • --batch N — frames per GPU batch. Larger = faster but more VRAM. Start with 8 for 1080p on 8 GB VRAM, 4 for 4K.
  • --half — float16 storage on CUDA (≈1.7× faster, ≈0.2 dB PSNR loss).
  • --deviceauto (default), cuda, mps, cpu.
  • --max-frames N — test run with only N frames.

4. Expected performance

Hardware 1080p frame 1 min of 1080p @ 30fps
RTX 4090 ~1.5 ms ~3 s
RTX 3060 12GB ~4 ms ~8 s
RTX 2060 ~10 ms ~20 s
Apple M1 Pro (MPS) ~30 ms ~1 min
CPU (i7 desktop, this repo) ~150 ms ~4.5 min

With --half on Ampere+ GPUs, expect another ~1.5–1.8× speedup.

5. How it differs from the CPU pipeline

  • Same mathematics as bypass_v3 in src/extraction/synthid_bypass.py.
  • Per-resolution codebook tensors (ref_mag, phase_factor, consistency, DC ramp, per-strength cons-weight) are pre-uploaded to the GPU once.
  • Per frame: batched FFT2 → subtract min(wm_mag, |fft|*cap) * e^{iφ} → IFFT2.
  • Anti-alias: 3×3 Gaussian on GPU via conv2d.
  • Non-exact resolutions: fall back to CPU bypass_v3 (infrequent, cheap).

6. Common issues

  • "CUDA out of memory" — lower --batch, or add --half.
  • MPS complex FFT fallback — on older PyTorch for Apple Silicon, fft2 on complex tensors may fall back to CPU, eliminating the speedup. Upgrade to torch>=2.3 for best results.
  • Output video looks identical — you may have passed a non-watermarked input, or the resolution has no exact-match profile. Run once at 1024×1024 or 1536×2816 to validate.