#259: CV Training Pipeline — YOLO26s-P2 Small Rocket Detection (V1→V2→V3)#277
Open
xiaotianlou wants to merge 25 commits intomainfrom
Open
#259: CV Training Pipeline — YOLO26s-P2 Small Rocket Detection (V1→V2→V3)#277xiaotianlou wants to merge 25 commits intomainfrom
xiaotianlou wants to merge 25 commits intomainfrom
Conversation
This script sets up a YOLO model for training with COCO dataset, including downloading and preparing negative samples.
┌────────────────────────┐
│ padding (灰色) │ ← ~208 行灰色填充
├────────────────────────┤
│ │
│ 你的实际图片 544×960 │ ← 小目标像素数完全没变
│ │
├────────────────────────┤
│ padding (灰色) │ ← ~208 行灰色填充
└────────────────────────┘
960×960
Refactor albumentations integration to use monkey patching for custom augmentation pipeline. Update function to not require train_args and adjust model training call accordingly.
540 is not divisible by strides 8/16/32, causing feature map dimension errors in P2 head. 544/32=17 cleanly divides all strides. Made-with: Cursor
Key fixes over coco.py: - torchrun compatible (monkey patch propagates to all GPU processes) - nbs=batch prevents weight_decay 3x amplification - multi_scale=False (was resizing down to 480px, destroying small targets) - patience=0 guarantees close_mosaic triggers at ep250 - COCO negatives reduced from 4000 to 2000 - Albumentations: blur_limit 7->5, added Downscale, removed BboxParams Made-with: Cursor
Stage 2: SGD lr0=0.002, rect=True (~30% less padding), mosaic=0 Stage 3: SGD lr0=0.0002, Albumentations probabilities reduced 30% Both use model=path (not resume=True) for clean optimizer reset. Made-with: Cursor
evaluate.py: reports mAP broken down by small/medium/large targets visualize_augment.py: Level 0 validation - renders mosaic=0.8 vs 0.4 to visually confirm small targets survive augmentation pipeline Made-with: Cursor
- Uses torchrun for DDP (fixes monkey patch + enables rect=True) - Dynamic GPU detection with 40GB threshold - Auto-retry up to 3x per stage with resume from last.pt - tmux session survives terminal/SSH disconnect - Each stage writes .stageN_result for automatic chaining Made-with: Cursor
- torchrun caused all ranks to land on GPU 0 (OOM). Reverted to Ultralytics internal DDP (device="0,1,2,3") for Stage 1. - Stage 2/3 use single-GPU for rect=True + Albumentations support. - Added MKL_THREADING_LAYER=GNU and PYTORCH_CUDA_ALLOC_CONF. - Reduced default batch from 192 to 128 for GPU contention safety. - Pipeline script now uses plain python instead of torchrun. Made-with: Cursor
Root cause: V1 Stage 2/3 used SGD while Stage 1 used MuSGD (via auto), destroying learned conv weight distributions. Stage 2 lr0=0.002 was 6.7x too aggressive vs proven lr (train81: 0.0003). V2 changes: - New train_stage1b.py: extend from Stage 1 best.pt with MuSGD lr0=0.005 mosaic=0.2 close_mosaic=30, 200ep 4-GPU DDP - train_stage2.py: SGD->MuSGD, lr0 0.002->0.001, +warmup_epochs=5 - train_stage3.py: SGD->MuSGD, +warmup_epochs=3, patience 25->15 - run_pipeline.bash: V2 flow with stage1b/stage2_v2/stage3_v2 naming - Robust save_dir detection in all stages (fixes "?" path bug) Made-with: Cursor
MuSGD's Muon component proved incompatible with fine-tuning from pre-trained checkpoints - fresh Muon state disrupted learned weights. Tested lr0=0.005 and lr0=0.002, both showed post-warmup regression. Switched to proven SGD approach (cf. train81: SGD lr=0.0003 -> 0.7500): - Stage 1b: SGD lr0=0.0005, cos_lr=True (ep16: mAP50-95=0.7380, stable) - Stage 2: SGD lr0=0.0003 (matching train81's proven fine-tune lr) - Stage 3: SGD lr0=0.0001 (ultra-low polish) Made-with: Cursor
Made-with: Cursor
Key changes from V2: - Single GPU + gradient accumulation (nbs=128) for ALL phases, eliminating the DDP->single-GPU regime change that caused V2 Stage 2/3 regression - Native `augmentations` parameter for custom Albumentations (no monkey-patch), confirmed working via v8_transforms getattr(hyp, "augmentations", None) - Fix missed augmentation: scale 0.08->0.25, erasing 0.0->0.30 - Phase 1: 250ep SGD lr=0.0003, 9 custom Albumentations transforms - Phase 2: 80ep SGD lr=0.0001, reduced augmentation, rect=False (consistent) - Offline small target copy-paste script for bbox-only datasets Made-with: Cursor
Comprehensive documentation of the V3 training pipeline including: - Model architecture (YOLO26s-P2, 9.66M params, 26.4 GFLOPs) - Dataset statistics (31,666 train images, 12,976 boxes, 9.9% COCO-small) - V3 design rationale (regime consistency, native augmentations, small-target focus) - Phase 1: 250ep joint training (SGD lr=0.0003, nbs=128, 9 Albumentations) - Phase 2: 80ep fine-tuning (SGD lr=0.0001, reduced augmentation) - Final result: mAP50-95 = 0.7651 (all-time best) - Full parameter comparison tables and reproduction guide Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Complete computer vision training pipeline for small rocket detection, evolving through three major iterations to achieve mAP₅₀₋₉₅ = 0.7651 (all-time best).
Pipeline Evolution
Final Model Metrics
Key Technical Contributions
nbs=128) eliminates the DDP→single-GPU regime shock that caused V1/V2 fine-tuning regressionmodel.train(augmentations=...)parameter; monkey-patch approach silently failed under DDPscale0.08→0.25 (3×),erasing0→0.30, 9 custom Albumentations transforms (blur, noise, compression, lighting)Dataset
rocket)Files Changed
src/training/train_v3_phase1.pysrc/training/train_v3_phase2.pysrc/training/run_pipeline.bashsrc/training/augment_small_targets.pysrc/training/train_stage1.pysrc/training/train_stage1b.pysrc/training/train_stage2.pysrc/training/train_stage3.pysrc/training/evaluate.pysrc/training/visualize_augment.pysrc/benchmark/convert_tensorrt.pydocs/V3_Training_Pipeline.mdHardware
How to Reproduce
See
docs/V3_Training_Pipeline.mdfor the complete pipeline specification with all hyperparameters and training curves.