DreamBooth & ControlNet: Pose-Conditioned Identity Preservation

This repository contains the implementation, experiments, and research notes for an AI Residency project focused on fine-tuning Stable Diffusion models for concurrent identity preservation and pose guidance.

📂 Repository Navigation

dreambooth/: Core DreamBooth implementation for subject identity learning.
controlnet/: Advanced fine-tuning of ControlNet integrated with DreamBooth architectures.
docs/: Project documentation, task descriptions, and technical concepts.
- 📄 Full Project Report
paper/: Detailed research summaries and pseudocode for relevant SOTA methods (HyperHuman, MagicPose, etc.).

🔬 Research & Implementation Summary

This project explores the fine-tuning of Stable Diffusion v1.5 to generate specific subjects (identity) in user-defined configurations (pose).

1. Phase I: Baseline and Human Subjects

DreamBooth Reproduction: Successfully learned specific subjects with minimal data; identified prompt fidelity vs. identity trade-offs.
Human Pose Integration: Combined ControlNet with DreamBooth. Discovered that 200–600 steps (avg. 400) and LoRA ranks $\geq$ 16 provide the optimal balance for identity preservation.
Key Finding: Fine-tuning the Text Encoder is critical for learning complex human identities but risks "catastrophic forgetting" of structural concepts.

2. Phase II: The Humanoid Challenge (Unitree G1 Robot)

Transitioned to a significantly harder domain: a non-humanoid robot subject with sparse training data (3–6 images).

Constraint: OpenPose algorithms struggle with non-humanoid joint structures, limiting dataset quality.
ControlNet Bias: Pre-trained ControlNets exhibit a strong "human bias," making it difficult to maintain robot morphology in extreme or unusual poses.

3. Advanced Methodologies

To address overfitting and structural bias, several research-backed techniques were implemented:

Custom Diffusion Optimization:
- K/V Attention Tuning: Trained only the Key (K) and Value (V) projections in cross-attention layers.
- Embedding Training: Optimized the [V] rare-token embedding exclusively, which reduced structural forgetting but resulted in lower identity fidelity.
Multi-Stage Training (MagicPose Style):
- Stage 1 (Appearance): Isolated identity training without ControlNet interference.
- Stage 2 (Pose): Structural guidance training with the identity-aware Text Encoder frozen.

🏁 Key Conclusions

Pose-Identity Conflict: Precise subject generation in "hard" (non-humanoid) poses remains a major challenge due to the inherent human-centric bias in pre-trained spatial adapters.
Overfitting vs. Generalization: Naive DreamBooth training often causes the model to "forget" structural flexibility. Strategic dropout and targeted parameter tuning (e.g., K/V attention) are essential for maintaining pose adherence.
Future Work: Bridging the gap between the specific morphology of non-humanoid subjects and general spatial conditioning models.

Tip

Refer to the IDEA.md for a deep dive into the technical papers that inspired these implementations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DreamBooth & ControlNet: Pose-Conditioned Identity Preservation

📂 Repository Navigation

🔬 Research & Implementation Summary

1. Phase I: Baseline and Human Subjects

2. Phase II: The Humanoid Challenge (Unitree G1 Robot)

3. Advanced Methodologies

🏁 Key Conclusions

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
controlnet		controlnet
docs		docs
dreambooth		dreambooth
paper		paper
README.md		README.md

BFCmath/PoseDreamBooth

Folders and files

Latest commit

History

Repository files navigation

DreamBooth & ControlNet: Pose-Conditioned Identity Preservation

📂 Repository Navigation

🔬 Research & Implementation Summary

1. Phase I: Baseline and Human Subjects

2. Phase II: The Humanoid Challenge (Unitree G1 Robot)

3. Advanced Methodologies

🏁 Key Conclusions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages