A production-grade ensemble system combining Transformer (Swin2SR) and GAN (Real-ESRGAN) architectures to achieve state-of-the-art 4x super-resolution for satellite imagery.
- Overview
- Key Features
- Installation
- Quick Start
- Project Structure
- Usage
- Model Architecture
- Performance & Results
- FAQ
- Contributing
- License
WorldStrat Ensemble is a high-performance super-resolution pipeline designed specifically for the WorldStrat satellite imagery dataset. It addresses the unique challenges of satellite SR including:
- Atmospheric Noise: Cloud interference, haze, and atmospheric scattering
- Low Resolution Input: Sentinel-2 imagery at 10m/pixel β WorldView-3 quality at 2.5m/pixel
- Dynamic Ranges: Varied illumination conditions from polar to equatorial regions
- Large-Scale Inference: Handling thousands of images efficiently
We fuse two complementary architectures:
| Model | Type | Strength | Weakness |
|---|---|---|---|
| Swin2SR | Transformer | Global structure, clean edges | Less detailed textures |
| Real-ESRGAN | GAN (RRDB) | Realistic high-frequency details | Can introduce artifacts |
Result: Ensemble achieves +0.2 to +0.4 dB PSNR improvement over best single model.
- β Crash-Proof: Gracefully handles corrupted files, missing checkpoints, GPU OOM errors
- β Checkpoint Recovery: Auto-detects weights from multiple search paths
- β Fallback Mechanisms: Uses best single model if ensemble fails validation
- π§ Adaptive Normalization: Auto-detects raw vs. pre-normalized satellite data
- π§ Dynamic Weighting: Validation-driven ensemble strategy (Equal/Softmax/Proportional)
- π§ Self-Validation: Computes PSNR before test inference to verify quality
- β‘ Memory Optimized: Runs on consumer GPUs (T4: 15GB, P100: 16GB)
- β‘ Multi-GPU Support: Automatic DataParallel for 2+ GPUs
- β‘ Progress Monitoring: Real-time logging with estimated time remaining
- Python: 3.8 or higher
- GPU: CUDA-enabled with 8GB+ VRAM (16GB recommended)
- Disk Space: 5GB for models + dataset
git clone https://github.com/Aditya26189/klymo.git
cd klymo# Install PyTorch with CUDA 11.8
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu118
# Install core dependencies
pip install transformers rasterio tifffile tqdm pandas numpy
# Install Swin2SR requirements
pip install timm einopsconda create -n worldstrat python=3.9
conda activate worldstrat
conda install pytorch torchvision pytorch-cuda=11.8 -c pytorch -c nvidia
pip install transformers rasterio tifffile tqdm pandas numpy timm einopsImportant
Model weights are NOT included in this repository due to size constraints. Download them from:
- Google Drive (~500MB)
- Hugging Face Hub
Place .pth files in final-models/:
final-models/
βββ swin2sr_best.pth # ~230MB
βββ realesrgan_best.pth # ~280MB
# 1. Navigate to project directory
cd klymo
# 2. Verify GPU is available
python -c "import torch; print('GPU:', torch.cuda.get_device_name(0))"
# 3. Run inference on sample images
python WORLDSTRAT_ENSEMBLE_CORRECTED.py \
--test_csv /path/to/test.csv \
--output_dir ./predictions
# 4. Check results
ls -lh predictions/ # Should see ~149 .tif filesfrom WORLDSTRAT_ENSEMBLE_CORRECTED import WorldStratInferenceDataset
import pandas as pd
# Create test dataframe
df = pd.DataFrame({
'lr_path': ['/data/sentinel2/image_001.tif'],
'location': ['test_location_001']
})
# Load dataset
dataset = WorldStratInferenceDataset(df, load_hr=False)
# Run inference (see notebook for full pipeline)klymo/
βββ π ENSEMBLE_FINAL_ROBUST.ipynb # Main inference notebook (Kaggle-ready)
βββ π WORLDSTRAT_ENSEMBLE_CORRECTED.py # Standalone Python script
βββ π README.md # This file
βββ π€ CONTRIBUTING.md # Contribution guidelines
βββ π RELEASE_NOTES.md # Version history
βββ π DEPLOYMENT.md # Production deployment guide
β
βββ π Documentation/
β βββ ENSEMBLE_REASONING_DOCUMENT.txt # Architecture decisions (detailed)
β βββ QA_DEPLOYMENT_CHECKLIST.txt # Pre-launch checklist
β
βββ π― final-models/ # Trained model weights
β βββ swin2sr_best.pth # Swin2SR checkpoint
β βββ realesrgan_best.pth # Real-ESRGAN checkpoint
β
βββ π sample-model/ # Training notebooks & configs
β βββ swin2sr-ultra-max-safe-city.ipynb
β βββ model-enrgan.ipynb
β
βββ π¦ archive/ # Historical experiments
Best for: Interactive execution, visualization, prototyping
- Open
ENSEMBLE_FINAL_ROBUST.ipynbin Jupyter/Kaggle - Configure paths in Cell 3 (Checkpoint Detection):
MODEL_CONFIGS = { 'swin2sr': { 'checkpoints': ['/kaggle/input/your-weights/swin2sr_best.pth'] }, # ... }
- Run cells sequentially (Shift+Enter)
- Monitor checkmarks:
- β Dependencies installed
- β GPU detected
- β Models loaded
- β Validation passed
- β Predictions generated
Best for: Batch processing, production servers, CI/CD
python WORLDSTRAT_ENSEMBLE_CORRECTED.py \
--test_csv /data/worldstrat/test.csv \
--output_dir /output/predictions \
--batch_size 4 \
--num_workers 4Arguments:
--test_csv: Path to test split CSV (must havelr_pathcolumn)--output_dir: Directory for super-resolved images (default:./predictions)--batch_size: Inference batch size (default: auto-detect based on GPU)--num_workers: Data loading workers (default: 2)
The system uses validation-driven adaptive weighting:
graph TD
A[Compute Validation PSNR] --> B{PSNR Ξ?}
B -->|Ξ < 0.3 dB| C[Equal Weights<br/>0.5, 0.5]
B -->|0.3 β€ Ξ β€ 1.0 dB| D[Softmax T=2.0<br/>~0.65, 0.35]
B -->|Ξ > 1.0 dB| E[Proportional<br/>~0.80, 0.20]
C --> F[Ensemble Prediction]
D --> F
E --> F
Why This Works:
- Close Performance: Equal weighting maximizes diversity
- Moderate Gap: Softmax balances contribution vs. quality
- Large Gap: Proportional prevents weak model from degrading results
- Architecture: Swin Transformer V2 with shifted windows
- Depth: [6, 6, 6, 6, 6, 6] (6 stages, 6 blocks each)
- Embedding Dim: 180
- Parameters: ~28.6M
- FLOPs: ~45.2G (for 128Γ128 input)
- Trained on: WorldStrat + ImageNet (pre-training)
- Generator: RRDBNet (Residual-in-Residual Dense Blocks)
- Blocks: 23 RRDB blocks
- Features: 64 base channels
- Growth: 32 channels per dense layer
- Parameters: ~16.7M
- Loss: Combination of L1 + Perceptual (VGG) + GAN
# Sentinel-2 (Input LR)
def normalize_sentinel(img):
# Raw: uint16 [0, 3000] for RGB bands
# Normalized: float32 [0, 1]
return np.clip(img / 3000.0, 0.0, 1.0)
# WorldView-3 (Target HR)
def normalize_worldview(img):
# Raw: uint16 12-bit [0, 4095]
# Normalized: float32 [0, 1]
return np.clip(img / 4095.0, 0.0, 1.0)| Model | Architecture | Params | Val PSNR | Val SSIM | Inference Time* |
|---|---|---|---|---|---|
| Swin2SR | Transformer | 28.6M | 29.59 dB | 0.8421 | 0.18s/img |
| Real-ESRGAN | GAN (RRDB) | 16.7M | 29.12 dB | 0.8392 | 0.14s/img |
| Ensemble | Weighted Avg | β | 29.83 dB | 0.8456 | 0.32s/img |
*On NVIDIA T4 GPU, batch_size=1, 512Γ512 output
Dataset: 149 validation samples from WorldStrat
Regions: Urban (45%), Rural (35%), Coastal (20%)
| Region | Swin2SR | ESRGAN | Ensemble | Ξ Improvement |
|---|---|---|---|---|
| Urban | 30.12 dB | 29.45 dB | 30.34 dB | +0.22 dB |
| Rural | 29.28 dB | 28.93 dB | 29.52 dB | +0.24 dB |
| Coastal | 29.01 dB | 28.78 dB | 29.28 dB | +0.27 dB |
Key Insight: Ensemble provides consistent improvement across all terrain types.
Q: What is the expected PSNR on the test set?
A: Based on validation: 29.6-30.0 dB. Actual test performance depends on distribution similarity.
Q: Can I use only one model instead of ensemble?
A: Yes! The notebook automatically falls back to the best single model if ensemble validation fails. You can also manually set use_ensemble = False.
Q: How long does inference take?
A: On a T4 GPU with 149 test images: ~12-15 minutes. On CPU: ~2-3 hours (not recommended).
Q: Why did my git push fail with "HTTP 408"?
A: You likely committed large .pth weight files. See CONTRIBUTING.md for using Git LFS or excluding weights.
Q: "CUDA out of memory" error?
A:
- Restart kernel:
Kernel β Restart - Reduce batch size (already at optimal 1)
- Use smaller GPU (T4 instead of P100 if memory leak)
- Check for zombie processes:
nvidia-smi
Q: Can I train these models from scratch?
A: Yes, see training notebooks in sample-model/. Note: Training requires ~20-30 hours on dual T4 GPUs.
Q: What if validation split is missing?
A: The code defaults to equal weighting [0.5, 0.5], which works well in practice. Ensemble will still improve over single models (~+0.15 dB instead of +0.25 dB).
Q: What bands does this use?
A: Sentinel-2: Bands [4, 3, 2] (Red, Green, Blue). WorldView-3: Bands [1, 2, 3] (RGB).
Q: Can I use this for other satellite datasets?
A: Yes, but you may need to adjust normalization ranges. See normalize_sentinel() and normalize_worldview() in the code.
We welcome contributions from the community! Please see CONTRIBUTING.md for:
- π Reporting bugs
- π‘ Suggesting enhancements
- π§ Submitting pull requests
- π Coding standards (PEP 8, type hints, docstrings)
- Fork the repo and create a feature branch
- Test locally on a subset of data
- Run linting:
flake8 *.py - Clear notebook outputs before committing
- Write clear commit messages
- Submit PR with description
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this code in your research, please cite:
@software{worldstrat_ensemble_2026,
author = {Aditya26189},
title = {WorldStrat Ensemble: Robust Satellite Image Super-Resolution},
year = {2026},
url = {https://github.com/Aditya26189/klymo}
}Developed for the WorldStrat Challenge π
Robust. Accurate. Scalable.