Create a new standalone Python script named DatasetAuditor.py .
Context: The script is used for Stable Diffusion LoRA training preparation. We need a pre-training dataset curation tool to find redundant images and filter synthetic datasets based on facial biometrics.
Core Engine: Use the insightface library (model: 'buffalo_l') with CUDAExecutionProvider to extract 512-dimensional facial embeddings.
It will scan your folder, rank likeness, flag redundancies, and generate a visual HTML report so you can see what to prune before hitting train.
Create a new standalone Python script named
DatasetAuditor.py.Context: The script is used for Stable Diffusion LoRA training preparation. We need a pre-training dataset curation tool to find redundant images and filter synthetic datasets based on facial biometrics.
Core Engine: Use the
insightfacelibrary (model: 'buffalo_l') withCUDAExecutionProviderto extract 512-dimensional facial embeddings.It will scan your folder, rank likeness, flag redundancies, and generate a visual HTML report so you can see what to prune before hitting train.