GPU Pipeline — SynthID Bypass for Images & Video

Drop-in GPU port of the V3 spectral bypass. Uses PyTorch, so the same code runs on NVIDIA CUDA (Windows/Linux), Apple MPS (Mac M-series), and CPU — device is auto-selected.

1. Install (Windows + NVIDIA)

git clone <this-repo-url>
cd reverse-SynthID

python -m venv venv
venv\Scripts\activate

pip install -r requirements.txt

# Pick the CUDA build that matches your driver (check with: nvidia-smi)
# CUDA 12.1:
pip install torch --index-url https://download.pytorch.org/whl/cu121
# CUDA 11.8:
# pip install torch --index-url https://download.pytorch.org/whl/cu118

Verify GPU is visible:

python -c "import torch; print('cuda?', torch.cuda.is_available(), torch.cuda.get_device_name(0) if torch.cuda.is_available() else '')"

2. Image bypass (GPU)

python -c "
import cv2, numpy as np
from src.extraction.gpu.bypass_torch import bypass_single_image
img = cv2.imread('validation_images/test_4x3_city.png')[:, :, ::-1]  # BGR->RGB
out = bypass_single_image(img, 'artifacts/spectral_codebook_v3.npz', strength='aggressive')
cv2.imwrite('out.png', out[:, :, ::-1])
print('saved out.png')
"

The codebook has exact-match profiles only at 1024x1024 and 1536x2816. Other resolutions fall back to the CPU spatial path (still correct, slower).

3. Video bypass (GPU, batched)

python -m src.extraction.gpu.bypass_video `
    --input  my_video.mp4 `
    --output my_video_clean.mp4 `
    --codebook artifacts/spectral_codebook_v3.npz `
    --strength aggressive `
    --batch 8

Flags:

--batch N — frames per GPU batch. Larger = faster but more VRAM. Start with 8 for 1080p on 8 GB VRAM, 4 for 4K.
--half — float16 storage on CUDA (≈1.7× faster, ≈0.2 dB PSNR loss).
--device — auto (default), cuda, mps, cpu.
--max-frames N — test run with only N frames.

4. Expected performance

Hardware	1080p frame	1 min of 1080p @ 30fps
RTX 4090	~1.5 ms	~3 s
RTX 3060 12GB	~4 ms	~8 s
RTX 2060	~10 ms	~20 s
Apple M1 Pro (MPS)	~30 ms	~1 min
CPU (i7 desktop, this repo)	~150 ms	~4.5 min

With --half on Ampere+ GPUs, expect another ~1.5–1.8× speedup.

5. How it differs from the CPU pipeline

Same mathematics as bypass_v3 in src/extraction/synthid_bypass.py.
Per-resolution codebook tensors (ref_mag, phase_factor, consistency, DC ramp, per-strength cons-weight) are pre-uploaded to the GPU once.
Per frame: batched FFT2 → subtract min(wm_mag, |fft|*cap) * e^{iφ} → IFFT2.
Anti-alias: 3×3 Gaussian on GPU via conv2d.
Non-exact resolutions: fall back to CPU bypass_v3 (infrequent, cheap).

6. Common issues

"CUDA out of memory" — lower --batch, or add --half.
MPS complex FFT fallback — on older PyTorch for Apple Silicon, fft2 on complex tensors may fall back to CPU, eliminating the speedup. Upgrade to torch>=2.3 for best results.
Output video looks identical — you may have passed a non-watermarked input, or the resolution has no exact-match profile. Run once at 1024×1024 or 1536×2816 to validate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Pipeline — SynthID Bypass for Images & Video

1. Install (Windows + NVIDIA)

2. Image bypass (GPU)

3. Video bypass (GPU, batched)

4. Expected performance

5. How it differs from the CPU pipeline

6. Common issues

FilesExpand file tree

README_GPU.md

Latest commit

History

README_GPU.md

File metadata and controls

GPU Pipeline — SynthID Bypass for Images & Video

1. Install (Windows + NVIDIA)

2. Image bypass (GPU)

3. Video bypass (GPU, batched)

4. Expected performance

5. How it differs from the CPU pipeline

6. Common issues