This project explores latent space manipulation techniques for StyleGAN2-ADA, focusing on both supervised (InterfaceGAN) and unsupervised (GANSpace) methods for controlling facial attributes in generated images. The work was conducted as part of the IMA 206 course at Telecom Paris in 2024.
- Introduction
- Project Structure
- Methods Implemented
- Key Features
- Requirements
- Usage
- Results
- References
This project implements and compares different approaches to manipulate the latent space of StyleGAN2-ADA for controllable face generation. The main objectives are:
- Discover semantic directions in the latent space corresponding to facial attributes (age, gender, smile, etc.)
- Disentangle attributes to enable independent control of different facial features
- Compare supervised and unsupervised methods for latent space exploration
- Enable precise face editing through attribute manipulation
InterfaceGAN/
├── Article-impression/ # Reference articles
│ ├── Article_1.pdf
│ ├── article2.pdf
│ └── article3.pdf
├── Rendu/ # Main deliverables
│ ├── Analyse non supervisée.ipynb # Unsupervised methods (GANSpace)
│ ├── InterfaceGAN.ipynb # Supervised methods (InterfaceGAN)
│ └── Rapport IMA 206.pdf # Project report
└── README.md # This file
- Principal Component Analysis (PCA) on the W latent space
- Kernel PCA (KPCA) with RBF kernel for non-linear feature extraction
- Variational Autoencoder (VAE) for latent space restructuring
- Generate 10,000-100,000 samples from StyleGAN2's W space
- Perform PCA to identify principal directions of variation
- Move along principal components to discover semantic attributes
- Layer-wise control: apply transformations to specific layers (0-17)
- v0: Gender
- v1: Gender + Rotation
- v2: Gender + Rotation + Age + Background
- v4: Age + Hairstyle (layers 5-18)
- v10: Hair color (layers 7-8)
- v20: Wrinkles (layer 6)
- v23: Expression/Smile (layers 3-5)
Kernel PCA Exploration
- Implemented KPCA with RBF kernel for non-linear dimensionality reduction
- Compared results with linear PCA
- Found similar semantic directions with slight differences in disentanglement
Autoencoder-based Latent Space Restructuring
- Designed a custom VAE with dual Gaussian components
- Architecture: 512 → 256 → 128 → 64 → 50 (latent) → 64 → 128 → 256 → 512
- Added orthogonality constraint to encourage independence
- Achieved better variance concentration: 95% variance in 20 components vs. 50+ for raw W space
InterfaceGAN uses linear SVM classifiers to find semantic boundaries in the latent space:
- Dataset Generation: Generate 50,000-100,000 images with StyleGAN2
- Attribute Classification: Use pre-trained classifiers to label images
- SVM Training: Train linear SVMs to separate attribute classes
- Direction Extraction: Normal vector to the SVM hyperplane = semantic direction
- Bald
- Hair color (black, blond, brown)
- Eyeglasses
- Heavy makeup
- Gender (male/female)
- Mustache
- Beard
- Smiling
- Hat
- Age (young)
Projection Method To control one attribute without affecting others, the project implements orthogonal projection:
# Make direction vect_normal_1 perpendicular to vect_normal_2
vect_normal_1_new = vect_normal_1 - (vect_normal_1 @ vect_normal_2) * vect_normal_2
vect_normal_1_new = vect_normal_1_new / np.linalg.norm(vect_normal_1_new)Examples:
- Age modification without adding/removing glasses
- Bald attribute on female faces
- Mustache without changing gender
- Gender change while preserving other attributes
Multi-attribute Orthogonalization
The move_along_attr_against_all_others function makes a direction orthogonal to all other attribute directions for maximum disentanglement.
StyleGAN2's architecture allows different layers to control different aspects:
- Layers 0-3: Coarse features (pose, glasses, general face shape)
- Layers 4-7: Medium features (facial features, age, gender)
- Layers 8-17: Fine features (hair texture, skin details, wrinkles)
Key Findings:
- Layers 0-3: Control glasses independently from age/gender
- Layers 4-5: Control age while preserving accessories
- Layers 6-7: Control facial hair (mustache/beard) with fine precision
- Enables better disentanglement than full-layer manipulation
- Automated generation of 50,000+ labeled face images
- Pre-trained classifier for 12 facial attributes
- Efficient batch processing with GPU acceleration
- Distribution histograms showing attribute projections
- Video generation showing smooth transitions
- 12×12 grid visualization of attribute correlations
- Side-by-side before/after comparisons
- Single attribute modification
- Multi-attribute editing
- Conditional manipulation (change A without changing B)
- Layer-specific control
- Precise scalar control for each attribute
- Semantic boundary correlation analysis
- PCA variance visualization
- TSNE visualization of latent space structure
- Comparison between PCA and KPCA results
python >= 3.8
torch >= 1.9.0
numpy >= 1.19.0
matplotlib >= 3.3.0
scikit-learn >= 0.24.0
pandas >= 1.2.0
imageio >= 2.9.0
tqdm >= 4.60.0
PIL >= 8.0.0
dnnlib
legacy
click
- GPU with CUDA support recommended (tested on NVIDIA GPUs and Apple Silicon with MPS)
- Minimum 16GB RAM
- 20GB+ disk space for models and generated images
# Generate samples from W space
w_samples = generate_samples(G, 10000)
# Perform PCA
pca, scaler = get_pca(w_samples, ncomp=50)
# Move along a principal component
img_list, X_list = move_according_to_component(
G, device, pca,
outdir='output_directory_pca',
directions=[0], # Gender direction
layers=list(range(0, 18)), # All layers
num_steps=100,
save_video=True,
title_vid='gender_all',
random_seed=2
)# Load dataset
df = pd.read_csv('label_faces_100K.csv')
# Edit a specific attribute
img_list = pipeline_1(
df,
scalars_young, # Movement range
'young', # Attribute to modify
2000, # Number of training samples
network_pkl,
seed_img=17,
num_img_generated=10,
outdir='out/along_1_vector',
file_name='move_along_young',
save_video=True,
return_images=True
)# Edit one attribute while keeping another fixed
img_list = move_along_attr_against_attrb2(
network_pkl,
scalars_young,
'young', # Attribute to change
'eyeglasses', # Attribute to preserve
2000,
seed_img=17,
num_img_generated=10,
outdir='out',
file_name='young_without_glasses',
save_video=True,
return_images=True
)# Edit beard only in specific layers
img_list = pipeline_4_layer(
df,
'beard',
2000,
network_pkl,
seed_img=17,
layers=[6, 7], # Fine detail layers
num_img_generated=10,
outdir='out',
file_name='beard_layerwise',
save_video=True,
return_images=True
)# Set specific values for each attribute
attributes_scalars = [
-3, # bald
0, # black_hair
0, # blond_hair
2, # brown_hair
-2, # eyeglasses
0, # heavy_makeup
-2, # male (more feminine)
0, # mustache
0, # beard
2, # smiling
0, # hat
-1 # young
]
edit_face(attributes_scalars, seed_img=17, plot_distribution=True)PCA Analysis:
- Successfully identified gender, age, rotation, hair color, and expression directions
- 95% variance explained by 50 components in raw W space
- Layer-wise application enables finer control
Kernel PCA:
- RBF kernel produces similar directions to linear PCA
- Slightly different magnitude of changes
- Cosine similarity near 1.0 with PCA directions
VAE Restructuring:
- Reduced dimensionality: 95% variance in 20 components
- More concentrated feature representation
- Comparable editing quality to raw W space PCA
Attribute Discovery:
- Successfully identified 12 distinct facial attributes
- High classification accuracy (>90%) for most attributes
- Clear semantic boundaries in W space
Disentanglement:
- Partial success in decoupling attributes
- Some inherent correlations difficult to remove (e.g., male/beard)
- Orthogonal projection improves independence
Layer-wise Control:
- Major finding: Layer-wise editing achieves better disentanglement
- Successfully created "female with mustache" - impossible with global editing
- Different layers control different semantic levels
Observed entanglements:
- Age ↔ Eyeglasses: Older people more likely to wear glasses
- Gender ↔ Beard/Mustache: Strong correlation
- Bald ↔ Hat: Negative correlation
- Age ↔ Hair color: Gray hair appears with age
Successfully disentangled:
- Age from glasses (using orthogonal projection)
- Gender from mustache (using layer-wise editing on layers 6-7)
- Smile from other attributes
StyleGAN2-ADA Generator:
- Mapping network: Z (512) → W (512)
- Synthesis network: 18 layers, each receiving W vector
- Pre-trained on FFHQ (Flickr-Faces-HQ) dataset
- Network:
ffhq.pklfrom NVIDIA
Classifier:
- Pre-trained on CelebA attributes
- 40 binary classifiers for facial attributes
- Used for automatic labeling of generated images
Z Space (Gaussian):
- 512-dimensional standard normal distribution
- Input to mapping network
W Space:
- 512-dimensional intermediate latent space
- More disentangled than Z
- Used for all manipulations in this project
W+ Space:
- 18 × 512 dimensions (different W for each layer)
- Allows layer-specific control
- Used for fine-grained editing
Hyperparameters:
- Kernel: Linear
- C: 1.0 (regularization)
- Training samples: 2,000-4,000 per attribute
- Test/train split: 70/30
Data Selection:
- Select top N samples with highest attribute probability
- Select bottom N samples with lowest attribute probability
- Ensures clear separation for SVM training
- Attribute Entanglement: Some attributes remain correlated despite disentanglement efforts
- Dataset Bias: Limited diversity in training data (mostly frontal faces)
- Computational Cost: Generating large datasets requires significant GPU time
- Binary Attributes: Many attributes are treated as binary (present/absent) when they're actually continuous
- Rare Combinations: Some attribute combinations (e.g., female + bald) are underrepresented
-
Advanced Disentanglement:
- Implement more sophisticated orthogonalization techniques
- Explore non-linear separation methods
- Use adversarial training for better independence
-
Interactive Editing:
- Real-time GUI for attribute manipulation
- Slider-based interface for continuous control
- Preset combinations for common edits
-
Extended Attributes:
- Fine-grained attributes (eye color, nose shape, etc.)
- Style attributes (lighting, background, etc.)
- Emotion attributes beyond smile
-
Better Evaluation:
- Quantitative metrics for disentanglement
- User studies for quality assessment
- Automated testing of attribute independence
-
Generalization:
- Apply methods to other domains (cars, animals, etc.)
- Transfer learning to other GAN architectures
- Cross-domain attribute manipulation
-
InterfaceGAN: Shen, Y., Gu, J., Tang, X., & Zhou, B. (2020). Interpreting the Latent Space of GANs for Semantic Face Editing. CVPR 2020.
-
GANSpace: Härkönen, E., Hertzmann, A., Lehtinen, J., & Paris, S. (2020). GANSpace: Discovering Interpretable GAN Controls. NeurIPS 2020.
-
StyleGAN2-ADA: Karras, T., et al. (2020). Training Generative Adversarial Networks with Limited Data. NeurIPS 2020.
Project developed as part of IMA 206 course at Telecom Paris, 2024.
This project is for educational purposes. Pre-trained models and base code follow their respective licenses from NVIDIA and other sources.
- NVIDIA for StyleGAN2-ADA implementation
- InterfaceGAN and GANSpace authors for methodology
- Telecom Paris for course framework
- CelebA and FFHQ datasets for training data