#259: CV Training Pipeline — YOLO26s-P2 Small Rocket Detection (V1→V2→V3) by xiaotianlou · Pull Request #277 · SpaceY-Labs/RoCam

xiaotianlou · 2026-01-29T18:29:58Z

Summary

Complete computer vision training pipeline for small rocket detection, evolving through three major iterations to achieve mAP₅₀₋₉₅ = 0.7651 (all-time best).

Pipeline Evolution

Version	Key Change	Best mAP₅₀₋₉₅
V1	YOLO26s → YOLO26s-P2 + COCO hard negatives + 3-stage fine-tuning	0.750
V2	Optimizer fix (SGD), extended Stage 1b (300ep, 4-GPU DDP)	0.762
V3	Regime-consistent training (nbs=128), native Albumentations, aggressive small-target augmentation	0.7651

Final Model Metrics

Metric	Value
mAP₅₀₋₉₅	0.7651
mAP₅₀	0.9623
Precision	0.9602
Recall	0.9151
Model	YOLO26s-P2 (9.66M params, 26.4 GFLOPs)

Key Technical Contributions

Training regime consistency — Single GPU + gradient accumulation (nbs=128) eliminates the DDP→single-GPU regime shock that caused V1/V2 fine-tuning regression
Native Albumentations integration — Discovered model.train(augmentations=...) parameter; monkey-patch approach silently failed under DDP
Small-target augmentation — scale 0.08→0.25 (3×), erasing 0→0.30, 9 custom Albumentations transforms (blur, noise, compression, lighting)
COCO hard negative mining — 4,000 background images (25.3% of training set) for false-positive suppression
Progressive augmentation reduction — Strong augmentation in Phase 1 (250ep) → gentle in Phase 2 (80ep)

Dataset

31,666 training images (15,832 labels + 4,000 COCO negatives)
12,976 bounding boxes, single class (rocket)
9.9% COCO-small targets (< 32px at imgsz=960)
Target sizes range from < 10px to 200+ px

Files Changed

File	Description
`src/training/train_v3_phase1.py`	V3 Phase 1: 250ep, SGD lr=0.0003, nbs=128, 9 Albumentations
`src/training/train_v3_phase2.py`	V3 Phase 2: 80ep, SGD lr=0.0001, reduced augmentation
`src/training/run_pipeline.bash`	Pipeline orchestrator (tmux/nohup, auto-retry, GPU selection)
`src/training/augment_small_targets.py`	Offline small-target copy-paste augmentation
`src/training/train_stage1.py`	V1 Stage 1 training script
`src/training/train_stage1b.py`	V2 Stage 1b (extended DDP, SGD)
`src/training/train_stage2.py`	V1/V2 Stage 2 fine-tuning
`src/training/train_stage3.py`	V1/V2 Stage 3 low-LR polish
`src/training/evaluate.py`	Size-stratified evaluation
`src/training/visualize_augment.py`	Augmentation visualization
`src/benchmark/convert_tensorrt.py`	ONNX export (544 divisibility fix)
`docs/V3_Training_Pipeline.md`	V3 pipeline specification (English)

Hardware

GPU: NVIDIA H100 PCIe 80GB (1 card for V3, 4 cards for V1/V2 DDP)
Server: McMaster Grace HPC
Framework: Ultralytics 8.4.6

How to Reproduce

cd src/training
bash run_pipeline.bash start  # tmux background, auto Phase 1 → Phase 2

See docs/V3_Training_Pipeline.md for the complete pipeline specification with all hyperparameters and training curves.

This script sets up a YOLO model for training with COCO dataset, including downloading and preparing negative samples.

┌────────────────────────┐ │ padding (灰色) │ ← ~208 行灰色填充 ├────────────────────────┤ │ │ │ 你的实际图片 544×960 │ ← 小目标像素数完全没变 │ │ ├────────────────────────┤ │ padding (灰色) │ ← ~208 行灰色填充 └────────────────────────┘ 960×960

Refactor albumentations integration to use monkey patching for custom augmentation pipeline. Update function to not require train_args and adjust model training call accordingly.

540 is not divisible by strides 8/16/32, causing feature map dimension errors in P2 head. 544/32=17 cleanly divides all strides. Made-with: Cursor

Key fixes over coco.py: - torchrun compatible (monkey patch propagates to all GPU processes) - nbs=batch prevents weight_decay 3x amplification - multi_scale=False (was resizing down to 480px, destroying small targets) - patience=0 guarantees close_mosaic triggers at ep250 - COCO negatives reduced from 4000 to 2000 - Albumentations: blur_limit 7->5, added Downscale, removed BboxParams Made-with: Cursor

Stage 2: SGD lr0=0.002, rect=True (~30% less padding), mosaic=0 Stage 3: SGD lr0=0.0002, Albumentations probabilities reduced 30% Both use model=path (not resume=True) for clean optimizer reset. Made-with: Cursor

evaluate.py: reports mAP broken down by small/medium/large targets visualize_augment.py: Level 0 validation - renders mosaic=0.8 vs 0.4 to visually confirm small targets survive augmentation pipeline Made-with: Cursor

- Uses torchrun for DDP (fixes monkey patch + enables rect=True) - Dynamic GPU detection with 40GB threshold - Auto-retry up to 3x per stage with resume from last.pt - tmux session survives terminal/SSH disconnect - Each stage writes .stageN_result for automatic chaining Made-with: Cursor

- torchrun caused all ranks to land on GPU 0 (OOM). Reverted to Ultralytics internal DDP (device="0,1,2,3") for Stage 1. - Stage 2/3 use single-GPU for rect=True + Albumentations support. - Added MKL_THREADING_LAYER=GNU and PYTORCH_CUDA_ALLOC_CONF. - Reduced default batch from 192 to 128 for GPU contention safety. - Pipeline script now uses plain python instead of torchrun. Made-with: Cursor

Root cause: V1 Stage 2/3 used SGD while Stage 1 used MuSGD (via auto), destroying learned conv weight distributions. Stage 2 lr0=0.002 was 6.7x too aggressive vs proven lr (train81: 0.0003). V2 changes: - New train_stage1b.py: extend from Stage 1 best.pt with MuSGD lr0=0.005 mosaic=0.2 close_mosaic=30, 200ep 4-GPU DDP - train_stage2.py: SGD->MuSGD, lr0 0.002->0.001, +warmup_epochs=5 - train_stage3.py: SGD->MuSGD, +warmup_epochs=3, patience 25->15 - run_pipeline.bash: V2 flow with stage1b/stage2_v2/stage3_v2 naming - Robust save_dir detection in all stages (fixes "?" path bug) Made-with: Cursor

MuSGD's Muon component proved incompatible with fine-tuning from pre-trained checkpoints - fresh Muon state disrupted learned weights. Tested lr0=0.005 and lr0=0.002, both showed post-warmup regression. Switched to proven SGD approach (cf. train81: SGD lr=0.0003 -> 0.7500): - Stage 1b: SGD lr0=0.0005, cos_lr=True (ep16: mAP50-95=0.7380, stable) - Stage 2: SGD lr0=0.0003 (matching train81's proven fine-tune lr) - Stage 3: SGD lr0=0.0001 (ultra-low polish) Made-with: Cursor

Made-with: Cursor

Key changes from V2: - Single GPU + gradient accumulation (nbs=128) for ALL phases, eliminating the DDP->single-GPU regime change that caused V2 Stage 2/3 regression - Native `augmentations` parameter for custom Albumentations (no monkey-patch), confirmed working via v8_transforms getattr(hyp, "augmentations", None) - Fix missed augmentation: scale 0.08->0.25, erasing 0.0->0.30 - Phase 1: 250ep SGD lr=0.0003, 9 custom Albumentations transforms - Phase 2: 80ep SGD lr=0.0001, reduced augmentation, rect=False (consistent) - Offline small target copy-paste script for bbox-only datasets Made-with: Cursor

Comprehensive documentation of the V3 training pipeline including: - Model architecture (YOLO26s-P2, 9.66M params, 26.4 GFLOPs) - Dataset statistics (31,666 train images, 12,976 boxes, 9.9% COCO-small) - V3 design rationale (regime consistency, native augmentations, small-target focus) - Phase 1: 250ep joint training (SGD lr=0.0003, nbs=128, 9 Albumentations) - Phase 2: 80ep fine-tuning (SGD lr=0.0001, reduced augmentation) - Final result: mAP50-95 = 0.7651 (all-time best) - Full parameter comparison tables and reproduction guide Made-with: Cursor

#259: update cv code

12dba10

xiaotianlou changed the title ~~#259: update cv code~~ #259: cv code save Jan 29, 2026

xiaotianlou added the Do not merge label Jan 29, 2026

xiaotianlou and others added 22 commits January 29, 2026 13:33

#259: update gr cv code

dd7a5a3

#259: rename

110988e

#259: update cv

f72509e

#259: update cv lab code

766495a

#259: update para

e32b91c

#259: update para

cb62c7c

using P2 s4 for small target

ca21580

This script sets up a YOLO model for training with COCO dataset, including downloading and preparing negative samples.

update

b26aa05

Update CUDA device settings and 960960

bce2303

Refactor albumentations integration and training call

71ab36f

Refactor albumentations integration to use monkey patching for custom augmentation pipeline. Update function to not require train_args and adjust model training call accordingly.

add finetune code

346ce81

fix: ONNX export resolution 540->544 for P2 stride divisibility

8618562

540 is not divisible by strides 8/16/32, causing feature map dimension errors in P2 head. 544/32=17 cleanly divides all strides. Made-with: Cursor

feat: add Stage 2 (rect=True fine-tune) and Stage 3 (low-LR polish)

2460438

Stage 2: SGD lr0=0.002, rect=True (~30% less padding), mosaic=0 Stage 3: SGD lr0=0.0002, Albumentations probabilities reduced 30% Both use model=path (not resume=True) for clean optimizer reset. Made-with: Cursor

feat: add size-stratified evaluation and augmentation visualization

99706c4

evaluate.py: reports mAP broken down by small/medium/large targets visualize_augment.py: Level 0 validation - renders mosaic=0.8 vs 0.4 to visually confirm small targets survive augmentation pipeline Made-with: Cursor

chore: remove tracked eval outputs, add to gitignore

7fac818

Made-with: Cursor

xiaotianlou removed the Do not merge label Mar 26, 2026

xiaotianlou requested a review from ShikeChen01 March 26, 2026 20:21

xiaotianlou and others added 2 commits March 26, 2026 16:23

Merge branch 'main' into CV_yolo26

cd6c98d

xiaotianlou changed the title ~~#259: cv code save~~ #259: CV Training Pipeline — YOLO26s-P2 Small Rocket Detection (V1→V2→V3) Mar 26, 2026

xiaotianlou marked this pull request as ready for review March 26, 2026 20:32

xiaotianlou requested a review from PegasisForever March 26, 2026 20:32

xiaotianlou added the ready to review label Mar 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#259: CV Training Pipeline — YOLO26s-P2 Small Rocket Detection (V1→V2→V3)#277

#259: CV Training Pipeline — YOLO26s-P2 Small Rocket Detection (V1→V2→V3)#277
xiaotianlou wants to merge 25 commits intomainfrom
CV_yolo26

xiaotianlou commented Jan 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xiaotianlou commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pipeline Evolution

Final Model Metrics

Key Technical Contributions

Dataset

Files Changed

Hardware

How to Reproduce

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xiaotianlou commented Jan 29, 2026 •

edited

Loading