Skip to content

y222gu/plants

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Plant Root Segmentation Pipeline

Instance segmentation of plant root cross-sections from fluorescence microscopy. Supports 4 species (Millet, Rice, Sorghum, Tomato), 3 microscopes (C10, Olympus, Zeiss), and 5 target classes (Whole Root, Aerenchyma, Endodermis, Vascular, Exodermis). Cereals (monocots) have aerenchyma; tomato (dicot) has exodermis instead. Models can train with 4 classes (standard) or 5 classes (with exodermis), using masked loss for missing annotations per species.

Project Structure

plants/
├── predict.py               # Inference + save predictions
├── evaluate.py              # Evaluation with GT comparison
├── analyze_downstream.py    # Downstream biological analysis
├── polygon_editor.py     # Manual annotation review/correction
├── preview_annotations.py   # Generate annotation preview images
├── visualize_augmentations.py
├── train/                   # Training scripts
│   ├── train_yolo.py
│   ├── train_unet.py
│   ├── train_sam.py
│   ├── train_cellpose.py
│   ├── run_training.py      # Sequential training runner
│   └── run_grid_training.py # Benchmark grid runner (runs 1-5)
├── src/                     # Shared library
│   ├── config.py            # Paths, class defs, defaults
│   ├── dataset.py           # SampleRegistry
│   ├── preprocessing.py     # Image loading, normalization
│   ├── annotation_utils.py  # YOLO annotation parsing
│   ├── splits.py            # Train/val/test splitting
│   ├── augmentation.py      # Albumentations transforms
│   ├── evaluation.py        # PredictionResult, converters
│   ├── metrics.py           # SegmentationMetrics
│   ├── postprocessing.py    # Post-processing pipeline
│   ├── downstream.py        # Downstream metric computation
│   ├── visualization.py     # Shared visualization utilities
│   ├── formats/             # YOLO, COCO, Mask NPZ exporters
│   └── models/              # Model-specific datasets/utils
└── data/
    ├── image/               # {Species}/{Microscope}/{Exp}/{Sample}/
    ├── annotation/          # YOLO polygon .txt files
    ├── prediction/          # Generated prediction .txt files
    └── downstream/          # Downstream analysis results

Data Directory Layout

All scripts expect a --data-dir argument pointing to a folder that contains an image/ subfolder with the TIF images. Two layouts are supported:

Structured layout (used for training data with annotations):

my_data/
├── image/
│   └── {Species}/{Microscope}/{Exp}/{Sample}/
│       ├── {Sample}_TRITC.tif
│       ├── {Sample}_FITC.tif
│       └── {Sample}_DAPI.tif
├── annotation/          # YOLO polygon .txt files
├── prediction/          # Generated by predict.py
└── downstream/          # Generated by analyze_downstream.py

Generic layout (for new/unlabeled data):

my_data/
├── image/
│   ├── sample_001/
│   │   ├── sample_001_TRITC.tif
│   │   ├── sample_001_FITC.tif
│   │   └── sample_001_DAPI.tif
│   ├── sample_002/
│   │   └── ...
│   └── ...
├── prediction/          # Generated by predict.py
└── downstream/          # Generated by analyze_downstream.py

Each sample must be in its own subfolder under image/ containing three TIF files with _TRITC, _FITC, and _DAPI suffixes.

Training

Training scripts are in the train/ folder. Each can be run directly:

# YOLO (4-class only)
python train/train_yolo.py --strategy A --num-classes 4

# U-Net++ multilabel (4-class or 5-class with masked loss)
python train/train_unet.py --mode multilabel --strategy A --num-classes 4
python train/train_unet.py --mode multilabel --strategy A --num-classes 5 --mask-missing

# SAM (4 or 5 classes)
python train/train_sam.py --strategy A --num-classes 5

# Cellpose (4 or 5 classes, per-class models)
python train/train_cellpose.py --version 3 --all-classes --strategy A --num-classes 5

Strategy A Benchmark Grid (Runs 1-5)

Run all 5 benchmark models sequentially with run_grid_training.py:

python train/run_grid_training.py                  # Run all 5 benchmark runs
python train/run_grid_training.py --only 1 3       # Run only runs #1 and #3
python train/run_grid_training.py --skip 4 5       # Skip SAM and Cellpose
python train/run_grid_training.py --epochs 5       # Quick test with 5 epochs
python train/run_grid_training.py --eval-only      # Only evaluate existing checkpoints
python train/run_grid_training.py --train-only     # Only train, skip evaluation
Run Model Classes Epochs Key Config
1 YOLO11m-seg 4 300 SGD, patience 15
2 U-Net++ multilabel 4 300 AdamW, cosine LR, patience 15
3 U-Net++ multilabel 5 300 AdamW, cosine LR, masked loss, patience 15
4 SAM vit_b 5 300 AdamW, cosine LR, frozen encoder, patience 15
5 Cellpose v3 per-class 5 150 AdamW, no early stopping

Each run trains the model and then evaluates on the test set via evaluate.py.

Annotation Class Index

Annotations are stored in YOLO polygon format (.txt files) using 6 raw classes. During training, these are converted into 4 or 5 semantic target classes by subtracting inner boundaries from outer boundaries to create ring masks.

Annotation classes (raw, in .txt files)

Class ID Name Color Description
0 Whole Root Blue Outer boundary of the entire root cross-section (1 per sample)
1 Aerenchyma Yellow Individual air spaces in the cortex (many per sample, cereals only)
2 Outer Endodermis Green Outer boundary of the endodermis ring (1 per sample)
3 Inner Endodermis Red Inner boundary of the endodermis ring (1 per sample)
4 Outer Exodermis Orange Outer boundary of the exodermis ring (1 per sample, tomato only)
5 Inner Exodermis Purple Inner boundary of the exodermis ring (1 per sample, tomato only)

Target classes (derived, used for model training)

Target ID Name Derivation Species
0 Whole Root Direct from annotation class 0 All
1 Aerenchyma Direct from annotation class 1 Cereals only
2 Endodermis Ring mask: annotation cls 2 minus cls 3 All
3 Vascular Area inside annotation class 3 All
4 Exodermis Ring mask: annotation cls 4 minus cls 5 Tomato only
  • 4-class mode (--num-classes 4): targets 0-3 only, exodermis ignored
  • 5-class mode (--num-classes 5): targets 0-4, uses --mask-missing for species lacking certain classes

Important: Ring classes (endodermis, exodermis) are derived by subtracting the inner boundary polygon from the outer boundary polygon. This subtraction happens during data loading (in annotation_utils.py), not in post-processing. The post-processing fill_holes step is ring-aware — it preserves the structural central hole of ring masks while filling only small artifact holes in the ring band.

Annotation counts per species

Species Samples cls 0: Whole Root cls 1: Aerenchyma cls 2: Outer Endo cls 3: Inner Endo cls 4: Outer Exo cls 5: Inner Exo Total Polygons
Millet 110 110 418 110 110 0 0 748
Rice 588 588 13,706 588 588 0 0 15,470
Sorghum 474 474 9,218 474 474 0 0 10,640
Tomato 545 545 0 545 545 545 545 2,725
Total 1,717 1,717 23,342 1,717 1,717 545 545 29,583
  • Classes 0, 2, 3 have exactly 1 polygon per sample across all species
  • Aerenchyma (cls 1): cereals only — avg ~3.8 (Millet), ~23.3 (Rice), ~19.4 (Sorghum) per sample
  • Exodermis (cls 4-5): tomato only — exactly 1 polygon each per sample
  • Tomato has no aerenchyma; cereals have no exodermis

Strategy A Data Split

All models share the same experiment-level split (seed=42). Samples from the same experiment always stay together. Rice/Zeiss (35 samples) is excluded — reserved for deployment evaluation.

Species Microscope Train Val Test Total
Millet Olympus 67 (1 exp) 29 (1 exp) 14 (1 exp) 110
Rice C10 38 (6 exps) 12 (4 exps) 50
Rice Olympus 383 (12 exps) 91 (2 exps) 29 (3 exps) 503
Sorghum C10 25 (1 exp) 19 (4 exps) 44
Sorghum Olympus 366 (77 exps) 43 (9 exps) 21 (13 exps) 430
Tomato C10 54 (1 exp) 11 (2 exps) 65
Tomato Olympus 432 (15 exps) 23 (1 exp) 25 (4 exps) 480
Total 1340 211 131 1682

Configurable Training Parameters

All training scripts support common flags:

Flag Description
--epochs Max training epochs
--batch-size Batch size
--lr Learning rate
--weight-decay Weight decay
--patience Early stopping patience
--optimizer Optimizer: adamw, adam, sgd
--scheduler LR scheduler: cosine, step, plateau
--num-classes Target classes: 4 (standard) or 5 (with exodermis)
--mask-missing Enable masked loss for missing annotations (U-Net++ only)
--save-every Save periodic checkpoint every N epochs
--img-size Input image size (default 1024)

Scripts

1. predict.py — Inference + Save Predictions

Run model inference on images, save predictions as YOLO .txt files, and optionally generate visualizations.

# Run on an arbitrary folder of TIF images
python predict.py --data-dir path/to/new_images/ --checkpoint path/to/best.pt

# Skip visualization generation
python predict.py --data-dir data/ --checkpoint path/to/best.pt --no-vis

# Custom batch size and confidence threshold
python predict.py --data-dir data/ --checkpoint path/to/best.pt \
    --batch-size 32 --conf-thresh 0.3

# Skip post-processing (raw YOLO output)
python predict.py --data-dir data/ --checkpoint path/to/best.pt --no-postprocess

Arguments:

Argument Default Description
--data-dir (required) Directory containing an image/ subfolder with TIF images
--checkpoint (required) YOLO model checkpoint (.pt)
--img-size 1024 Inference image size
--conf-thresh 0.25 Confidence threshold
--batch-size 16 GPU batch size
--no-vis false Skip visualization output
--no-postprocess false Disable post-processing (ring-aware hole filling, aerenchyma clipping, etc.)
--max-dim 800 Max dimension for visualization images

Output:

  • {data-dir}/prediction/*.txt — YOLO-format polygon predictions (one per sample)
  • {data-dir}/prediction/vis/*.png — 2-panel (Original | Prediction) overlay images (unless --no-vis)

2. evaluate.py — Model Evaluation

Evaluate any trained model against ground truth annotations. Supports all 4 model types: YOLO, U-Net++, SAM, and Cellpose. Computes IoU/Dice metrics, generates comparison plots, and saves visualizations.

# Evaluate YOLO model
python evaluate.py --model yolo --checkpoint path/to/best.pt \
    --strategy A --num-classes 4

# Evaluate U-Net++ (multilabel mode, 5-class)
python evaluate.py --model unet --unet-mode multilabel \
    --checkpoint path/to/best.ckpt --strategy A --num-classes 5

# Evaluate SAM
python evaluate.py --model sam --sam-type vit_b \
    --checkpoint path/to/best.pth --strategy A --num-classes 5

# Evaluate Cellpose (loads per-class models from directory)
python evaluate.py --model cellpose \
    --checkpoint path/to/cellpose_run_dir/ --strategy A --num-classes 5

# Use saved predictions instead of running inference
python evaluate.py --from-predictions data/prediction/ --strategy A

# Skip visualizations, only compute metrics
python evaluate.py --model yolo --checkpoint best.pt --no-vis

# Regenerate plots from existing metrics JSON
python evaluate.py --plot-only output/evaluation/yolo_metrics.json

Arguments:

Argument Default Description
--data-dir data/ Data directory with image/ and annotation/ subfolders
--model (required*) Model type: yolo, unet, sam, cellpose
--checkpoint (required*) Path to model checkpoint (file or dir for Cellpose)
--num-classes 4 Number of target classes (4 or 5)
--from-predictions Load saved YOLO .txt files (skip inference)
--img-size 1024 Inference image size
--unet-mode semantic U-Net mode: semantic or multilabel
--sam-type vit_b SAM model type: vit_b, vit_l, vit_h
--strategy Split strategy: A, B, C
--split test Which split to evaluate: train, val, test
--seed 42 Random seed for split generation
--no-vis false Skip visualization overlay images
--vis-dir auto Custom visualization output directory
--no-metrics false Skip metric computation
--no-plots false Skip plot generation
--plot-only Regenerate plots from existing JSON
--enable-pp Force-enable post-processing steps
--disable-pp Force-disable post-processing steps
--no-postprocess false Disable all post-processing

*Not required when using --plot-only or --from-predictions.

Output:

  • output/evaluation/{model}_metrics.json — Aggregated metrics
  • output/evaluation/{model}_per_sample.csv — Per-sample IoU/Dice
  • output/evaluation/{model}_*_comparison.{png,pdf} — Box plots by species/microscope
  • output/evaluation/vis_{model}/*.png — 3-panel (Original | GT | Prediction) overlays

3. analyze_downstream.py — Downstream Biological Analysis

Compute biologically meaningful metrics (aerenchyma ratio, channel intensities) from either ground truth, predictions, or both. When both are available, generates scatter plots with regression lines and R² values.

# Compare GT vs predictions (default when both exist)
python analyze_downstream.py --data-dir data/ --source both

# Analyze predictions only (e.g., on new unlabeled data)
python analyze_downstream.py --data-dir data/ --source prediction

# Analyze ground truth only
python analyze_downstream.py --data-dir data/ --source gt

# Generate predictions first if they don't exist
python analyze_downstream.py --data-dir data/ --source both \
    --checkpoint path/to/best.pt

# Regenerate plots from existing CSV
python analyze_downstream.py --plot-only output/downstream/comparison.csv

Arguments:

Argument Default Description
--data-dir data/ Data folder
--source auto-detect gt, prediction, or both
--checkpoint YOLO checkpoint (generates predictions if missing)
--output auto Custom CSV output path
--no-plots false Skip generating plots
--plot-only Regenerate plots from existing CSV

Output:

  • {data-dir}/downstream/{source}.csv — Per-sample downstream metrics
  • {data-dir}/downstream/downstream_*.{png,pdf} — Scatter plots with R² (when --source both)

Computed metrics:

  • Aerenchyma ratio (aerenchyma area / whole root area)
  • Aerenchyma instance count
  • Endodermis mean intensity per channel (TRITC, FITC, DAPI)
  • Vascular mean intensity per channel (TRITC, FITC, DAPI)

4. polygon_editor.py — Interactive Annotation Editor

GUI tool for visualizing, correcting, and creating YOLO polygon annotations.

# Launch with a data directory
python polygon_editor.py --data-dir data/

# Launch with generic (non-structured) data
python polygon_editor.py --data-dir path/to/new_data/

# Launch without arguments (use Browse button to select folder)
python polygon_editor.py

Modes (select from the Mode dropdown):

Mode Panels Required folders Description
Correct GT 3 (Original, GT, Prediction) image/, annotation/, prediction/ Edit ground truth with predictions as reference
Correct Predictions 2 (Original, Editable) image/, prediction/ Edit predictions, save to annotation/
Create GT 2 (Original, Editable) image/ Draw annotations from scratch, save to annotation/

Controls:

Key Action
A / Left Previous sample
D / Right Next sample
N Start drawing new polygon with nodes (click to add points)
B Enter brush mode (erase default, Shift=add, Ctrl+scroll=size)
E Edit selected polygon with brush (same as B on selection)
Enter Confirm drawing or edits
Escape Cancel drawing or edits (reverts all changes)
Delete / Backspace Delete selected vertex (edit mode) or polygon
S Save annotations to file
Ctrl+C Copy selected reference polygon to editable panel
C Copy ALL reference polygons to editable panel
1-4 Set class for new polygon
Ctrl+Z / Ctrl+Shift+Z Undo / Redo
Mouse wheel Zoom in/out
Middle/Right drag Pan the image
H Reset zoom and center all panels

Vertex editing: Drag vertices to move them. Hover over an edge to see a green "+" marker; click to add a vertex. Select a vertex and press Delete to remove it.

Saving: Press S to save. Annotations are saved in YOLO polygon format to {data-dir}/annotation/ (created automatically if it does not exist).


Typical Workflow

# 1. Run all Strategy A benchmark models (trains + evaluates all 5 runs)
python train/run_grid_training.py

# 2. Or train a single model
python train/train_unet.py --mode multilabel --strategy A --num-classes 5 --mask-missing

# 3. Evaluate on test split
python evaluate.py --model unet --unet-mode multilabel --checkpoint path/to/best.ckpt \
    --strategy A --num-classes 5

# 4. Generate predictions on new data
python predict.py --data-dir path/to/new_data/ --checkpoint path/to/best.pt

# 5. Run downstream analysis
python analyze_downstream.py --data-dir data/ --source both

# 6. Review and correct predictions
python polygon_editor.py --data-dir path/to/new_data/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages