Skip to content

#259: CV Training Pipeline — YOLO26s-P2 Small Rocket Detection (V1→V2→V3)#277

Open
xiaotianlou wants to merge 25 commits intomainfrom
CV_yolo26
Open

#259: CV Training Pipeline — YOLO26s-P2 Small Rocket Detection (V1→V2→V3)#277
xiaotianlou wants to merge 25 commits intomainfrom
CV_yolo26

Conversation

@xiaotianlou
Copy link
Copy Markdown
Collaborator

@xiaotianlou xiaotianlou commented Jan 29, 2026

Summary

Complete computer vision training pipeline for small rocket detection, evolving through three major iterations to achieve mAP₅₀₋₉₅ = 0.7651 (all-time best).

Pipeline Evolution

Version Key Change Best mAP₅₀₋₉₅
V1 YOLO26s → YOLO26s-P2 + COCO hard negatives + 3-stage fine-tuning 0.750
V2 Optimizer fix (SGD), extended Stage 1b (300ep, 4-GPU DDP) 0.762
V3 Regime-consistent training (nbs=128), native Albumentations, aggressive small-target augmentation 0.7651

Final Model Metrics

Metric Value
mAP₅₀₋₉₅ 0.7651
mAP₅₀ 0.9623
Precision 0.9602
Recall 0.9151
Model YOLO26s-P2 (9.66M params, 26.4 GFLOPs)

Key Technical Contributions

  1. Training regime consistency — Single GPU + gradient accumulation (nbs=128) eliminates the DDP→single-GPU regime shock that caused V1/V2 fine-tuning regression
  2. Native Albumentations integration — Discovered model.train(augmentations=...) parameter; monkey-patch approach silently failed under DDP
  3. Small-target augmentationscale 0.08→0.25 (3×), erasing 0→0.30, 9 custom Albumentations transforms (blur, noise, compression, lighting)
  4. COCO hard negative mining — 4,000 background images (25.3% of training set) for false-positive suppression
  5. Progressive augmentation reduction — Strong augmentation in Phase 1 (250ep) → gentle in Phase 2 (80ep)

Dataset

  • 31,666 training images (15,832 labels + 4,000 COCO negatives)
  • 12,976 bounding boxes, single class (rocket)
  • 9.9% COCO-small targets (< 32px at imgsz=960)
  • Target sizes range from < 10px to 200+ px

Files Changed

File Description
src/training/train_v3_phase1.py V3 Phase 1: 250ep, SGD lr=0.0003, nbs=128, 9 Albumentations
src/training/train_v3_phase2.py V3 Phase 2: 80ep, SGD lr=0.0001, reduced augmentation
src/training/run_pipeline.bash Pipeline orchestrator (tmux/nohup, auto-retry, GPU selection)
src/training/augment_small_targets.py Offline small-target copy-paste augmentation
src/training/train_stage1.py V1 Stage 1 training script
src/training/train_stage1b.py V2 Stage 1b (extended DDP, SGD)
src/training/train_stage2.py V1/V2 Stage 2 fine-tuning
src/training/train_stage3.py V1/V2 Stage 3 low-LR polish
src/training/evaluate.py Size-stratified evaluation
src/training/visualize_augment.py Augmentation visualization
src/benchmark/convert_tensorrt.py ONNX export (544 divisibility fix)
docs/V3_Training_Pipeline.md V3 pipeline specification (English)

Hardware

  • GPU: NVIDIA H100 PCIe 80GB (1 card for V3, 4 cards for V1/V2 DDP)
  • Server: McMaster Grace HPC
  • Framework: Ultralytics 8.4.6

How to Reproduce

cd src/training
bash run_pipeline.bash start  # tmux background, auto Phase 1 → Phase 2

See docs/V3_Training_Pipeline.md for the complete pipeline specification with all hyperparameters and training curves.

@xiaotianlou xiaotianlou changed the title #259: update cv code #259: cv code save Jan 29, 2026
xiaotianlou and others added 22 commits January 29, 2026 13:33
This script sets up a YOLO model for training with COCO dataset, including downloading and preparing negative samples.
┌────────────────────────┐
│     padding (灰色)      │  ← ~208 行灰色填充
├────────────────────────┤
│                        │
│   你的实际图片 544×960    │  ← 小目标像素数完全没变
│                        │
├────────────────────────┤
│     padding (灰色)      │  ← ~208 行灰色填充
└────────────────────────┘
         960×960
Refactor albumentations integration to use monkey patching for custom augmentation pipeline. Update function to not require train_args and adjust model training call accordingly.
540 is not divisible by strides 8/16/32, causing feature map
dimension errors in P2 head. 544/32=17 cleanly divides all strides.

Made-with: Cursor
Key fixes over coco.py:
- torchrun compatible (monkey patch propagates to all GPU processes)
- nbs=batch prevents weight_decay 3x amplification
- multi_scale=False (was resizing down to 480px, destroying small targets)
- patience=0 guarantees close_mosaic triggers at ep250
- COCO negatives reduced from 4000 to 2000
- Albumentations: blur_limit 7->5, added Downscale, removed BboxParams

Made-with: Cursor
Stage 2: SGD lr0=0.002, rect=True (~30% less padding), mosaic=0
Stage 3: SGD lr0=0.0002, Albumentations probabilities reduced 30%
Both use model=path (not resume=True) for clean optimizer reset.

Made-with: Cursor
evaluate.py: reports mAP broken down by small/medium/large targets
visualize_augment.py: Level 0 validation - renders mosaic=0.8 vs 0.4
  to visually confirm small targets survive augmentation pipeline

Made-with: Cursor
- Uses torchrun for DDP (fixes monkey patch + enables rect=True)
- Dynamic GPU detection with 40GB threshold
- Auto-retry up to 3x per stage with resume from last.pt
- tmux session survives terminal/SSH disconnect
- Each stage writes .stageN_result for automatic chaining

Made-with: Cursor
- torchrun caused all ranks to land on GPU 0 (OOM). Reverted to
  Ultralytics internal DDP (device="0,1,2,3") for Stage 1.
- Stage 2/3 use single-GPU for rect=True + Albumentations support.
- Added MKL_THREADING_LAYER=GNU and PYTORCH_CUDA_ALLOC_CONF.
- Reduced default batch from 192 to 128 for GPU contention safety.
- Pipeline script now uses plain python instead of torchrun.

Made-with: Cursor
Root cause: V1 Stage 2/3 used SGD while Stage 1 used MuSGD (via auto),
destroying learned conv weight distributions. Stage 2 lr0=0.002 was
6.7x too aggressive vs proven lr (train81: 0.0003).

V2 changes:
- New train_stage1b.py: extend from Stage 1 best.pt with MuSGD lr0=0.005
  mosaic=0.2 close_mosaic=30, 200ep 4-GPU DDP
- train_stage2.py: SGD->MuSGD, lr0 0.002->0.001, +warmup_epochs=5
- train_stage3.py: SGD->MuSGD, +warmup_epochs=3, patience 25->15
- run_pipeline.bash: V2 flow with stage1b/stage2_v2/stage3_v2 naming
- Robust save_dir detection in all stages (fixes "?" path bug)

Made-with: Cursor
MuSGD's Muon component proved incompatible with fine-tuning from
pre-trained checkpoints - fresh Muon state disrupted learned weights.
Tested lr0=0.005 and lr0=0.002, both showed post-warmup regression.

Switched to proven SGD approach (cf. train81: SGD lr=0.0003 -> 0.7500):
- Stage 1b: SGD lr0=0.0005, cos_lr=True (ep16: mAP50-95=0.7380, stable)
- Stage 2: SGD lr0=0.0003 (matching train81's proven fine-tune lr)
- Stage 3: SGD lr0=0.0001 (ultra-low polish)

Made-with: Cursor
Key changes from V2:
- Single GPU + gradient accumulation (nbs=128) for ALL phases, eliminating
  the DDP->single-GPU regime change that caused V2 Stage 2/3 regression
- Native `augmentations` parameter for custom Albumentations (no monkey-patch),
  confirmed working via v8_transforms getattr(hyp, "augmentations", None)
- Fix missed augmentation: scale 0.08->0.25, erasing 0.0->0.30
- Phase 1: 250ep SGD lr=0.0003, 9 custom Albumentations transforms
- Phase 2: 80ep SGD lr=0.0001, reduced augmentation, rect=False (consistent)
- Offline small target copy-paste script for bbox-only datasets

Made-with: Cursor
xiaotianlou and others added 2 commits March 26, 2026 16:23
Comprehensive documentation of the V3 training pipeline including:
- Model architecture (YOLO26s-P2, 9.66M params, 26.4 GFLOPs)
- Dataset statistics (31,666 train images, 12,976 boxes, 9.9% COCO-small)
- V3 design rationale (regime consistency, native augmentations, small-target focus)
- Phase 1: 250ep joint training (SGD lr=0.0003, nbs=128, 9 Albumentations)
- Phase 2: 80ep fine-tuning (SGD lr=0.0001, reduced augmentation)
- Final result: mAP50-95 = 0.7651 (all-time best)
- Full parameter comparison tables and reproduction guide

Made-with: Cursor
@xiaotianlou xiaotianlou changed the title #259: cv code save #259: CV Training Pipeline — YOLO26s-P2 Small Rocket Detection (V1→V2→V3) Mar 26, 2026
@xiaotianlou xiaotianlou marked this pull request as ready for review March 26, 2026 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant