Privatar leverages both local (VR headset) and untrusted cloud hardware to achieve privacy-preserving multi-user avatar reconstruction. It horizontally partitions a frequency-decomposed VAE decoder, keeping privacy-sensitive low-frequency components local while offloading high-frequency components with calibrated noise injection.
/work/
├── multiface/ # Baseline VAE (DeepAppearanceVAE)
├── multiface_direct_split/ # Direct architecture split into local + cloud
├── multiface_quantization/ # Low-precision decoder (8-16 bit)
├── multiface_sparse/ # Channel-pruned decoder (20-80% sparsity)
├── multiface_frequency_decompose/ # BDCT frequency decomposition (no offloading)
├── multiface_partition_frequency_decompose/ # BDCT + horizontal partitioning (Privatar)
├── experiment_scripts/
│ ├── dp_analysis/ # Differential Privacy noise generation
│ ├── pac_analysis/ # PAC Privacy noise generation
│ ├── empirical_attack/ # Attack configs and frame lists
│ ├── bdct_reconstruction/ # BDCT visualization notebook
│ ├── render_scripts/ # Expression rendering utilities
│ ├── figure_drawer/ # scripts to draw all figures
│ └── dataset_config/ # Dataset download scripts
├── dataset/ # Multiface dataset (created during setup)
├── pretrain_model/ # Pretrained model checkpoint
├── training_results/ # All training outputs
└── testing_results/ # All testing outputs
Each model variant directory contains:
train.py/test.py-- core training and testing logiclaunch_train_job_serial.py/launch_test_job_serial.py-- launcher scripts with configurable parameterslatency_profiling_script*.py-- inference latency measurementmodels.py-- model architecture definitions
Hardware:
- NVIDIA GPU with CUDA support (validated on RTX 5090; RTX 3090/4090 also supported)
- 16 GB+ GPU memory, 52 GB+ system RAM
- 1 TB disk space (dataset + models + results). The full dataset for one identity does require terabytes of storage. However, for functional correctness testing, only a small subset of the dataset is needed (
Privatar/experiment_scripts/dataset_config/minimal_config.json), and that subset requires no more than 50 GB of disk space.
Software:
- NVIDIA Docker:
nvcr.io/nvidia/pytorch:24.01-py3 - For RTX 5090: nightly PyTorch with CUDA 13.0 support
# Pull the Docker image
docker pull nvcr.io/nvidia/pytorch:24.01-py3
# Clone the repository
git clone https://github.com/georgia-tech-synergy-lab/Privatar.git
# Launch with GPU access (replace <path> with your local clone path)
docker run --gpus all -v <path>:/work \
-it --ipc=host --ulimit memlock=-1 \
--ulimit stack=67108864 --memory 51200m \
--rm nvcr.io/nvidia/pytorch:24.01-py3All commands below assume
/workis the mount point inside Docker.
# OS-level dependencies
apt-get update && apt-get install -y mesa-common-dev libegl1-mesa-dev libgles2-mesa-dev mesa-utils
# Python packages
pip3 install Pillow ninja imageio imageio_ffmpeg six tensorboard opencv-python wandb torchjpeg lpips
# For RTX 5090 only: install nightly PyTorch with CUDA 13.0
pip install -U --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu130
# Install nvdiffrast (--no-build-isolation ensures it links against installed PyTorch)
git clone https://github.com/NVlabs/nvdiffrast
cd nvdiffrast && pip install --no-build-isolation -e . && cd ..
# For RTX 5090 only: apply nvdiffrast patch
source /work/experiment_scripts/nvdiffrast_patch.shNote:
wandbis optional (disabled by default). Setwandb_enable = Truein any training/testing script to enable real-time monitoring.
mkdir -p /work/dataset
python3 /work/experiment_scripts/dataset_config/download_dataset.py \
--dest "/work/dataset" \
--download_config "/work/experiment_scripts/dataset_config/mini_download_config.json"This downloads the Multiface dataset for subject 6795937 (~30 GB): facial images, tracked meshes, and unwrapped UV textures across 65+ expressions and 40 camera views.
mkdir -p /work/pretrain_model
wget -O /work/pretrain_model/6795937_best_model.pth \
https://fb-baas-f32eacb9-8abb-11eb-b2b8-4857dd089e15.s3.amazonaws.com/MugsyDataRelease/PretrainedModel/6795937--GHS-base_nosl/best_model.pthThe pretrained base model (97 MB) initializes all training variants. Other subject models are listed at Multiface pretrained models.
The pipeline has 8 sequential steps. Each step depends on outputs from previous steps.
Step 1: Training ──> Step 2: Testing ──> Step 3: Latency Profiling
│
├──> Step 4: Noise Calculation (DP + PAC)
│ │
│ └──> Step 5: Noisy Inference
│ │
│ ├──> Step 6: Empirical Attack
│ └──> Step 7: NN-based Attack
│
└──> Step 8: Frequency Covariance Analysis
| Parameter | Functional Test | Full Reproduction |
|---|---|---|
val_num |
30 | 500 |
max_iter |
1000 | 100000 |
| Time per variant | ~16 minutes | ~48 hours |
To run a functional test, edit val_num and max_iter in each launch_train_job_serial.py before running. The baseline (multiface/launch_train_job_serial.py) ships with small defaults (val_num=50, max_iter=100); all other variants default to full reproduction values (val_num=500, max_iter=100000).
Train all six model variants. Each produces a best_model.pth checkpoint in /work/training_results/.
| Variant | Directory | Command | Output Path |
|---|---|---|---|
| Baseline | multiface/ |
python3 launch_train_job_serial.py |
training_results/multiface/ |
| Direct Split | multiface_direct_split/ |
python3 launch_train_job_serial.py |
training_results/multiface_direct_split/ |
| Quantization | multiface_quantization/ |
python3 launch_train_job_serial.py |
training_results/quant_{8..16}/ |
| Sparsity | multiface_sparse/ |
python3 launch_train_job_serial.py |
training_results/sparse_0_{2..8}/ |
| Frequency Decompose | multiface_frequency_decompose/ |
python3 launch_train_job_serial.py |
training_results/partition_0/ |
| Partitioned (Privatar) | multiface_partition_frequency_decompose/ |
python3 launch_train_job_serial.py |
training_results/partition_{2..14}/ |
# Example: train a single variant
cd /work/multiface_partition_frequency_decompose
python3 launch_train_job_serial.pyConfigurable parameters (edit in each launch_train_job_serial.py):
- Quantization:
bitwidth_list(default:[8, 9, 10, 11, 12, 13, 14, 15, 16]) - Sparsity:
sparsity_list(default:[0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]) - Partitioning:
num_freq_comp_offloaded_list(default:[14]; set to[2, 4, 6, 8, 10, 12, 14]for all configs)
Expected output (functional test, max_iter=100000): representative final training screen loss values:
| Variant | Screen Loss |
|---|---|
| Baseline | ~0.072 |
| Direct Split | ~0.073 |
| Quantization (8-bit) | ~0.073 |
| Sparsity (20%) | ~0.092 |
| Frequency Decompose | ~0.077 |
| Partitioned (14 offloaded) | ~0.077 |
Evaluate trained models on the test set. Computes MSE (screen, texture, vertex) and LPIPS metrics. Saves latent codes needed for noise calculation (Step 4).
# Run for each variant (same directory structure as training)
cd /work/<variant_directory>
python3 launch_test_job_serial.pyAll results are saved to /work/testing_results/. Latent codes are stored in testing_results/<project_name>/latent_code/:
z_<id>.pth-- local-path latent codesz_offload_<id>.pth-- offloaded-path latent codes
Expected output (using fully-trained models):
| Variant | Screen MSE | LPIPS |
|---|---|---|
| Baseline | ~0.076 | ~0.610 |
| Partitioned (14 offloaded) | ~0.077 | ~0.612 |
Rendering specific expressions: To render a selected set of expressions (e.g., for figure generation), use
launch_test_selected_expressions.pyin each variant directory. Configure the expression list in/work/experiment_scripts/render_scripts/test_image_path. Runpython3 /work/experiment_scripts/render_scripts/render_test_expression.pyto generate ground-truth inputs.
Measure decoder inference latency. The baseline runs on modeling VR headset (when device is not VR headset, it defaults to CPU); all other variants run on GPU (modeling cloud execution). Uses torch.jit.trace for kernel optimization where applicable.
| Variant | Command | Device |
|---|---|---|
| Baseline | cd /work/multiface && python3 latency_profiling_script.py |
CPU |
| Quantization | cd /work/multiface_quantization && python3 latency_profiling_script.py |
GPU |
| Sparsity | cd /work/multiface_sparse && python3 latency_profiling_script.py |
GPU |
| Freq. Decompose | cd /work/multiface_frequency_decompose && python3 latency_profiling_script.py |
GPU |
| Partitioned (local) | cd /work/multiface_partition_frequency_decompose && python3 latency_profiling_script_local_path.py |
GPU |
| Partitioned (offload) | cd /work/multiface_partition_frequency_decompose && python3 latency_profiling_script_offload_path.py |
GPU |
For FLOPs analysis across partition configurations:
cd /work/multiface_partition_frequency_decompose
python3 latency_flops_calculation.pyExpected output (RTX 5090):
| Configuration | Latency |
|---|---|
| Baseline (CPU) | ~15.5 ms |
| Quantized (8-bit, GPU, traced) | ~0.69 ms |
| Sparse (10% pruned, GPU) | ~0.87 ms |
| Sparse (90% pruned, GPU) | ~0.42 ms |
| Freq. Decompose (GPU, traced) | ~0.59 ms |
| Partitioned local path (GPU) | 0.24--0.32 ms |
| Partitioned offload path (GPU) | 0.19--0.28 ms |
| FLOPs range | 1.48 G (14 offloaded) to 6.07 G (2 offloaded) |
Note:
multiface_direct_splitis an analytical design choice; no latency profiling script is provided for it.
Requires: Completed training (Step 1) and testing (Step 2) to obtain latent codes.
Two noise mechanisms are supported:
- Differential Privacy (DP): Uniform noise based on L2 norm of latent codes (Balle et al., 2017)
- PAC Privacy: Non-uniform noise leveraging per-dimension covariance via SVD decomposition (Xiao et al., CRYPTO 2023)
cd /work/experiment_scripts/dp_analysis
# For baseline (complete offload)
python3 dp_noise_generation_for_multiface.py
# For partitioned configurations (local + offloaded branches)
python3 dp_noise_generation_for_partition_multiface.pyOutput: /work/experiment_scripts/dp_analysis/generated_dp_noise/
dp_noise_completed_offloaded_multiface_decoder_{mi}.npy(5 files)dp_noise_partition_offloaded_decoder_{freq}_{mi}.npy(40 files)dp_noise_partition_local_decoder_{freq}_{mi}.npy(35 files)- Total: 80 files
cd /work/experiment_scripts/pac_analysis
python3 pac_noise_generation_for_partition_multiface.pyOutput: /work/experiment_scripts/pac_analysis/noise_covariance/
pac_noise_partition_local_decoder_{freq}_{mi}.npypac_noise_partition_offloaded_decoder_{freq}_{mi}.npy- Total: 75 files
Both use mutual information bounds [4, 3, 1, 0.1, 0.01] corresponding to posterior success rates [98%, 82.7%, 40%, 9%, 3.5%].
Requires: Trained models (Step 1) and noise files (Step 4).
Inject generated noise into offloaded latent codes and evaluate avatar quality degradation.
cd /work/multiface_partition_frequency_decompose
python3 launch_noisy_test_job_serial.pyToggle using_pac_noise = True/False to switch between PAC and DP noise. Results are saved to /work/testing_results/test_noisy_partition_{freq}_{mi}/.
Expected output:
| Configuration | Screen MSE |
|---|---|
| Partition-2 with PAC noise (MI=1) | ~0.086 |
| Partition-14 with DP noise (MI=0.01) | ~0.086 |
Requires: Trained models (Step 1) and noise files (Step 4).
The empirical attacker guesses expressions by matching predicted high-frequency texture components to precomputed reference components (see Fig. 14 in paper).
cd /work/multiface_partition_frequency_decompose
python3 launch_empirical_attack.pyToggle using_pac_noise = True/False for PAC vs. DP noise. Results are saved to /work/testing_results/empirical_attack_partition_{freq}_{mi}/.
Attack modes (configured via booleans in test_empirical_attack_run.py):
accumulate_channel = True: accumulates high-frequency components as referenceattack_from_high_frequency_channel = True: uses only high-frequency components- Both
False: merges frequency components by ambiguity (configurable threshold)
Expected output: PSR ~3.1% for partition-14 with MI=1 PAC noise (prior rate: 1/56 = 1.8%).
Requires: Trained models (Step 1) and noise files (Step 4).
Train a 3-layer fully-connected classifier (256 -> 128 -> 66) to identify expressions from noisy offloaded latent codes.
cd /work/multiface_partition_frequency_decompose
# Train the attacker (10 epochs per partition config)
python3 launch_train_nn_attacker.py
# Test the attacker under various noise levels
python3 launch_test_nn_attacker.pyTraining data: one sample per expression from /work/experiment_scripts/empirical_attack/selected_expression_frame_list.txt.
Expected output: PSR ~1.5% on noisy latent codes (below the 1/56 = 1.8% prior rate), confirming robustness against learned attacks.
Analyze the covariance trace of each of the 16 BDCT frequency components to understand the variance distribution that motivates offloading high-frequency components.
cd /work/multiface_frequency_decompose
python3 launch_l2norm_freq_cov_analysis.pyExpected output:
trace of covariance = 11308.31 for freq component = 0
trace of covariance = 199.42 for freq component = 1
trace of covariance = 77.53 for freq component = 2
trace of covariance = 41.90 for freq component = 3
...
trace of covariance = 12.81 for freq component = 15
Low-frequency components (component 0) carry ~880x more variance than high-frequency components (component 15), confirming that high-frequency components are safe to offload.
Interactive notebook for visualizing frequency decomposition of unwrapped textures:
jupyter notebook /work/experiment_scripts/bdct_reconstruction/bdct_4x4_reconstruction_dataloader.ipynbRender avatar predictions for a specified set of input images across all model configurations:
cd /work/<variant_directory>
python3 launch_test_all_expressions_RTX3090.pyConfigure input images in /work/experiment_scripts/render_scripts/test_image_path. Results are saved to /work/render_results/<configuration_name>/.
| Parameter | Location | Values |
|---|---|---|
| Training duration | launch_train_job_serial.py |
val_num, max_iter |
| Partition configs | num_freq_comp_offloaded_list |
2, 4, 6, 8, 10, 12, 14 |
| Quantization bits | bitwidth_list |
8--16 |
| Sparsity ratio | sparsity_list |
0.2--0.8 |
| Noise type | using_pac_noise |
True (PAC) / False (DP) |
| MI budget | mi_list / mutual_info_bound_list |
4, 3, 1, 0.1, 0.01 |
| Wandb logging | wandb_enable |
True / False |
