Skip to content

vpulab/ReSAGE-PAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ§ͺ ReSAGE-PAR: Representational Similarity Assessment for Generative Expansion in Pedestrian Attribute Recognition πŸšΆβ€β™‚οΈβœ¨

End-to-end pipeline to generate dataset-aware synthetic pedestrian images, score text–image alignment, and produce pseudo-labels for training Pedestrian Attribute Recognition (PAR) models.

πŸ” Stages: πŸ–ΌοΈ Stage A (Generation) β†’ πŸ“ Stage B (Scoring) β†’ 🏷️ Stage C (Pseudo-labeling)
πŸ“Š Optional: Metric tools to compare Real vs Synthetic distributions (FID/CFID/CMMD/FD-DINO)


πŸ“Œ Table of Contents


✨ Overview

This repo supports a generate–score–autolabel workflow:

  • βœ… Stage A generates synthetic images using Stable Diffusion + LoRA (optionally with attribute editing).
  • βœ… Stage B computes image–text similarity scores to assess alignment between the synthetic image and its prompt.
  • βœ… Stage C trains a lightweight SI-classifier on real scores and uses it to generate pseudo-label vectors for synthetic data.

⚑ Quick Start

βœ… Option 1 β€” Full pipeline (recommended) πŸš€

Run everything with YAML configs:

./wholePipelineWithConfigs.sh

This executes:

  • πŸ–ΌοΈ Stage A: generation
  • πŸ“ Stage B: scoring (real + synthetic)
  • 🏷️ Stage C: pseudo-labeling

🧩 Option 2 β€” Run stages individually

πŸ–ΌοΈ Stage A β€” Image generation

python src/stage_a_generation/run_stage_a.py --config configs/stage_a.yaml

πŸ“ Stage B β€” Scoring (real dataset)

python -m src.stage_b_scoring.run_stage_b --config configs/stage_b_scores.yaml

πŸ“ Stage B β€” Scoring (synthetic dataset)

python -m src.stage_b_scoring.run_stage_b --config configs/stage_b_scores_syn.yaml

🏷️ Stage C β€” Train SI-classifier

python -m src.stage_c_pseudolabeling.run_stage_c --config configs/stage_c_train.yaml

πŸ§ͺ Stage C β€” Test SI-classifier

python -m src.stage_c_pseudolabeling.run_stage_c --config configs/stage_c_test.yaml

🏷️ Stage C β€” Label synthetic data

python -m src.stage_c_pseudolabeling.run_stage_c --config configs/stage_c_labeling_syn.yaml

πŸ’‘ Tip: Prefer python -m ... to avoid relative-import issues.


🧱 Pipeline Stages

πŸ–ΌοΈ Stage A β€” Synthetic Image Generation

πŸ“ src/stage_a_generation/ Generate synthetic images using a trained LoRA model and prompts derived from attribute vectors.

  • Config: configs/stage_a.yaml
  • Script: src/stage_a_generation/run_stage_a.py

πŸ“ Stage B β€” Similarity Scoring

πŸ“ src/stage_b_scoring/ Compute similarity scores (e.g. BLIP score) for:

  • real images + (positive / complemented) prompts (train/test)

  • synthetic images + generation prompt

  • Configs:

    • configs/stage_b_scores.yaml (real)
    • configs/stage_b_scores_syn.yaml (synthetic)

🏷️ Stage C β€” Pseudo-Labeling

πŸ“ src/stage_c_pseudolabeling/ Train/test an SI-classifier on real scores and produce pseudo-labels for synthetic samples.

  • Configs:

    • configs/stage_c_train.yaml
    • configs/stage_c_test.yaml
    • configs/stage_c_labeling_syn.yaml

πŸ“Š Tools β€” Metric Analysis (optional)

πŸ“ tools/metricAspect/ Compare distributions between Real and Synthetic using metrics like: FID, CFID, CMMD, FD-DINO.


πŸ—‚οΈ Config Files

All configs live in configs/:

  • lora.yaml β€” LoRA training (optional / may be commented out in full pipeline)
  • getMetadata.yaml β€” metadata.jsonl generation (optional)
  • stage_a.yaml β€” generation settings
  • stage_b_scores.yaml β€” scoring on real data
  • stage_b_scores_syn.yaml β€” scoring on synthetic data
  • stage_c_train.yaml β€” classifier training
  • stage_c_test.yaml β€” classifier testing
  • stage_c_labeling_syn.yaml β€” synthetic pseudo-labeling

🧬 Minimal config structure

name: "experiment_name"
testing: true

model:
  pretrained_model_name_or_path: "..."

generation:  # or: training, scoring, dataset, etc.
  dataset_name: "RAPzs"

output:
  output_dir: "results/"

🧰 CLI vs YAML

βœ… Run with config:

python script.py --config configs/stage_a.yaml

βœ… Override a YAML value:

python script.py --config configs/stage_a.yaml --height 512

πŸ”Ž Rule: CLI arguments override YAML values.


πŸ§ͺ Environments

Each stage has its own conda environment for reproducibility:

🧬 Create environments

conda env create -f environments/stage_a.yaml
conda env create -f environments/stage_b.yaml
conda env create -f environments/stage_c.yaml

βœ… Activate

conda activate stage_a   # LoRA + generation
conda activate stage_b   # scoring
conda activate stage_c   # pseudo-labeling

πŸ“Š Metric tools env (optional)

conda env create -f environments/tool_metric_aspect.yaml
conda activate tool_metric_aspect

πŸ—ƒοΈ Datasets Setup

Before running the pipeline, ensure the real datasets are available on disk. The stages expect three dataset-related paths (which you can override via YAML or CLI in Stages A, B, and C):

  • path_dataset: folder with dataset images used by the pipeline
  • path_gt: path to the ground-truth pickle (dataset_all.pkl)
  • path_gt_img: folder with ground-truth image files (used for GT lookups/metrics)

Defaults vary by dataset. Example defaults for PA100k (as used in configs/getMetadata_custom_paths.yaml):

  • path_dataset (images): /mnt/rhome/paa/pedestrian/datasetForFID/PA100k/train/
  • path_gt (pickle): /mnt/rhome/paa/pedestrian/Rethinking_of_PAR/data/PA100k/dataset_all.pkl
  • path_gt_img (GT images): /mnt/rhome/paa/pedestrian/Rethinking_of_PAR/data/PA100k/data/

Example folder layout (PA100k):

/path/
β”œβ”€β”€ datasetdivided/
β”‚   └── PA100k/
β”‚       β”œβ”€β”€ train/             # path_dataset β†’ images for training split
β”‚       └── test/              
└── data/
    └── PA100k/
      β”œβ”€β”€ dataset_all.pkl  # path_gt β†’ ground-truth annotations
      └── data/            # path_gt_img β†’ GT image directory

You can provide custom paths per dataset in any stage:

  • Via YAML: uncomment the dataset: block and set path_dataset, path_gt, path_gt_img in the stage config (e.g., configs/stage_a.yaml, configs/stage_b_scores.yaml, configs/stage_c_train.yaml)
  • Via CLI: pass --path_dataset, --path_gt, --path_gt_img when running the stage scripts

CLI takes precedence over YAML; if not provided, the code uses dataset-specific defaults.

Create train/test splits automatically (PA100k)

If you only have the PA100k ground-truth images and the dataset_all.pkl, you can automatically create the split folders using the helper script:

# Dry-run: preview the actions
python tools/dataset_splitters/create_dataset_split.py \
  --dataset PA100k \
  --pkl /mnt/rhome/paa/pedestrian/Rethinking_of_PAR/data/PA100k/dataset_all.pkl \
  --images_root /mnt/rhome/paa/pedestrian/Rethinking_of_PAR/data/PA100k/data \
  --out_root /mnt/rhome/paa/pedestrian/datasetForFID \
  --mode symlink \
  --dry_run

# Execute: create symlinks into train/ and test/
python tools/dataset_splitters/create_dataset_split.py \
  --dataset PA100k \
  --pkl /mnt/rhome/paa/pedestrian/Rethinking_of_PAR/data/PA100k/dataset_all.pkl \
  --images_root /mnt/rhome/paa/pedestrian/Rethinking_of_PAR/data/PA100k/data \
  --out_root /mnt/rhome/paa/pedestrian/datasetForFID \
  --mode symlink

This will create the following structure if it doesn't exist already:

/mnt/rhome/paa/pedestrian/datasetForFID/
└── PA100k/
    β”œβ”€β”€ train/    # or trainval/ for RAPv1
    └── test/

You can switch --mode to hardlink or copy if you prefer. Once created, use these paths as path_dataset in your stage configs:

  • Train: /mnt/rhome/paa/pedestrian/datasetForFID/PA100k/train/
  • Test: /mnt/rhome/paa/pedestrian/datasetForFID/PA100k/test/

πŸ—ƒοΈ Datasets supported

Use --dataset / dataset_name with: PA100k, PETA, PETAzs, RAPv1, RAPv2, RAPzs


🧾 Metadata for LoRA training (important!)

Before running LoRA training or prompt-based generation, you typically need:

βœ… metadata.jsonl (captions/prompts for each image) Optional: include attribute vectors with --save-vectors.

Example:

# Run from repo root (recommended)
PYTHONPATH=. python src/lora_training/getMetadataDataset.py \
  --module customDatasets.RAPzsAll \
  --class RAPzsDatasetAll \
  --pathDataset /path/to/RAPzs/ \
  --num-images 17062 \
  --save-vectors

πŸ“¦ Inputs & Outputs

πŸ–ΌοΈ Stage A outputs

Given --path_syn <OUTPUT_DIR> (or config equivalent), you should get:

  • <path_syn>/condImgs/ βœ… conditional (real) images
  • <path_syn>/generatedImgs/ βœ… synthetic images
  • <path_syn>/generated.csv βœ… prompt + filenames (+ optional vectors)

πŸ“ Stage B outputs

Scores are written under folders like:

  • <dataset>_<prompting>_<score_name>_<strategy>_scores/

    • scores_train.xlsx (+ .csv)
    • scores_test.xlsx (+ .csv)

Synthetic scoring typically creates:

  • scores_syn.xlsx / scores_syn.csv (depending on config)

🏷️ Stage C outputs

Artifacts under folders like:

  • <dataset>_<prompting>_<score_name>_<strategy>_<clf_tag>_si/artifacts/

    • classifier.pkl
    • classifier_tag.txt
    • train_predictions.csv
    • test_predictions.csv
    • pseudolabels_syn.csv

🧰 Troubleshooting

❌ attempted relative import with no known parent package

You’re likely running a file directly like:

python src/stage_c_pseudolabeling/run_stage_c.py ...

βœ… Fix: run as a module from repo root:

PYTHONPATH=. python -m src.stage_c_pseudolabeling.run_stage_c --config configs/stage_c_train.yaml

❌ Pytest import errors (src not found)

Run tests from repo root with:

PYTHONPATH=. pytest -q

⚠️ β€œRuntimeWarning: found in sys.modules after import...”

This often happens when mixing python file.py and python -m module. βœ… Stick to python -m ... consistently.


πŸ™Œ Notes

  • Prefer running from repo root.
  • Use YAML configs for reproducibility.
  • Use python -m ... to avoid import issues.

Happy experimenting! πŸš€βœ¨

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors