Skip to content

Aditya26189/klymo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

33 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌍 WorldStrat Ensemble: Robust Satellite Image Super-Resolution

PyTorch Python Status License

A production-grade ensemble system combining Transformer (Swin2SR) and GAN (Real-ESRGAN) architectures to achieve state-of-the-art 4x super-resolution for satellite imagery.


οΏ½ Table of Contents


οΏ½πŸ“– Overview

WorldStrat Ensemble is a high-performance super-resolution pipeline designed specifically for the WorldStrat satellite imagery dataset. It addresses the unique challenges of satellite SR including:

  • Atmospheric Noise: Cloud interference, haze, and atmospheric scattering
  • Low Resolution Input: Sentinel-2 imagery at 10m/pixel β†’ WorldView-3 quality at 2.5m/pixel
  • Dynamic Ranges: Varied illumination conditions from polar to equatorial regions
  • Large-Scale Inference: Handling thousands of images efficiently

Why Ensemble?

We fuse two complementary architectures:

Model Type Strength Weakness
Swin2SR Transformer Global structure, clean edges Less detailed textures
Real-ESRGAN GAN (RRDB) Realistic high-frequency details Can introduce artifacts

Result: Ensemble achieves +0.2 to +0.4 dB PSNR improvement over best single model.


πŸš€ Key Features

Robustness

  • βœ… Crash-Proof: Gracefully handles corrupted files, missing checkpoints, GPU OOM errors
  • βœ… Checkpoint Recovery: Auto-detects weights from multiple search paths
  • βœ… Fallback Mechanisms: Uses best single model if ensemble fails validation

Intelligence

  • 🧠 Adaptive Normalization: Auto-detects raw vs. pre-normalized satellite data
  • 🧠 Dynamic Weighting: Validation-driven ensemble strategy (Equal/Softmax/Proportional)
  • 🧠 Self-Validation: Computes PSNR before test inference to verify quality

Efficiency

  • ⚑ Memory Optimized: Runs on consumer GPUs (T4: 15GB, P100: 16GB)
  • ⚑ Multi-GPU Support: Automatic DataParallel for 2+ GPUs
  • ⚑ Progress Monitoring: Real-time logging with estimated time remaining

πŸ› οΈ Installation

Prerequisites

  • Python: 3.8 or higher
  • GPU: CUDA-enabled with 8GB+ VRAM (16GB recommended)
  • Disk Space: 5GB for models + dataset

Step 1: Clone Repository

git clone https://github.com/Aditya26189/klymo.git
cd klymo

Step 2: Install Dependencies

Option A: Using pip (Recommended)

# Install PyTorch with CUDA 11.8
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu118

# Install core dependencies
pip install transformers rasterio tifffile tqdm pandas numpy

# Install Swin2SR requirements
pip install timm einops

Option B: Using conda

conda create -n worldstrat python=3.9
conda activate worldstrat
conda install pytorch torchvision pytorch-cuda=11.8 -c pytorch -c nvidia
pip install transformers rasterio tifffile tqdm pandas numpy timm einops

Step 3: Download Model Weights

Important

Model weights are NOT included in this repository due to size constraints. Download them from:

Place .pth files in final-models/:

final-models/
β”œβ”€β”€ swin2sr_best.pth      # ~230MB
└── realesrgan_best.pth   # ~280MB

⚑ Quick Start

5-Minute Tutorial

# 1. Navigate to project directory
cd klymo

# 2. Verify GPU is available
python -c "import torch; print('GPU:', torch.cuda.get_device_name(0))"

# 3. Run inference on sample images
python WORLDSTRAT_ENSEMBLE_CORRECTED.py \
  --test_csv /path/to/test.csv \
  --output_dir ./predictions

# 4. Check results
ls -lh predictions/  # Should see ~149 .tif files

Example: Processing Custom Images

from WORLDSTRAT_ENSEMBLE_CORRECTED import WorldStratInferenceDataset
import pandas as pd

# Create test dataframe
df = pd.DataFrame({
    'lr_path': ['/data/sentinel2/image_001.tif'],
    'location': ['test_location_001']
})

# Load dataset
dataset = WorldStratInferenceDataset(df, load_hr=False)

# Run inference (see notebook for full pipeline)

πŸ“‚ Project Structure

klymo/
β”œβ”€β”€ πŸ““ ENSEMBLE_FINAL_ROBUST.ipynb       # Main inference notebook (Kaggle-ready)
β”œβ”€β”€ 🐍 WORLDSTRAT_ENSEMBLE_CORRECTED.py  # Standalone Python script
β”œβ”€β”€ πŸ“– README.md                         # This file
β”œβ”€β”€ 🀝 CONTRIBUTING.md                   # Contribution guidelines
β”œβ”€β”€ πŸ“‹ RELEASE_NOTES.md                  # Version history
β”œβ”€β”€ πŸš€ DEPLOYMENT.md                     # Production deployment guide
β”‚
β”œβ”€β”€ πŸ“„ Documentation/
β”‚   β”œβ”€β”€ ENSEMBLE_REASONING_DOCUMENT.txt  # Architecture decisions (detailed)
β”‚   └── QA_DEPLOYMENT_CHECKLIST.txt      # Pre-launch checklist
β”‚
β”œβ”€β”€ 🎯 final-models/                     # Trained model weights
β”‚   β”œβ”€β”€ swin2sr_best.pth                 # Swin2SR checkpoint
β”‚   └── realesrgan_best.pth              # Real-ESRGAN checkpoint
β”‚
β”œβ”€β”€ πŸ“‚ sample-model/                     # Training notebooks & configs
β”‚   β”œβ”€β”€ swin2sr-ultra-max-safe-city.ipynb
β”‚   └── model-enrgan.ipynb
β”‚
└── πŸ“¦ archive/                          # Historical experiments

πŸ’» Usage

Option A: Jupyter Notebook (Kaggle/Colab)

Best for: Interactive execution, visualization, prototyping

  1. Open ENSEMBLE_FINAL_ROBUST.ipynb in Jupyter/Kaggle
  2. Configure paths in Cell 3 (Checkpoint Detection):
    MODEL_CONFIGS = {
        'swin2sr': {
            'checkpoints': ['/kaggle/input/your-weights/swin2sr_best.pth']
        },
        # ...
    }
  3. Run cells sequentially (Shift+Enter)
  4. Monitor checkmarks:
    • βœ… Dependencies installed
    • βœ… GPU detected
    • βœ… Models loaded
    • βœ… Validation passed
    • βœ… Predictions generated

Option B: Standalone Script

Best for: Batch processing, production servers, CI/CD

python WORLDSTRAT_ENSEMBLE_CORRECTED.py \
  --test_csv /data/worldstrat/test.csv \
  --output_dir /output/predictions \
  --batch_size 4 \
  --num_workers 4

Arguments:

  • --test_csv: Path to test split CSV (must have lr_path column)
  • --output_dir: Directory for super-resolved images (default: ./predictions)
  • --batch_size: Inference batch size (default: auto-detect based on GPU)
  • --num_workers: Data loading workers (default: 2)

🧠 Model Architecture

Ensemble Strategy

The system uses validation-driven adaptive weighting:

graph TD
    A[Compute Validation PSNR] --> B{PSNR Ξ”?}
    B -->|Ξ” < 0.3 dB| C[Equal Weights<br/>0.5, 0.5]
    B -->|0.3 ≀ Ξ” ≀ 1.0 dB| D[Softmax T=2.0<br/>~0.65, 0.35]
    B -->|Ξ” > 1.0 dB| E[Proportional<br/>~0.80, 0.20]
    C --> F[Ensemble Prediction]
    D --> F
    E --> F
Loading

Why This Works:

  • Close Performance: Equal weighting maximizes diversity
  • Moderate Gap: Softmax balances contribution vs. quality
  • Large Gap: Proportional prevents weak model from degrading results

Model Details

Swin2SR (Transformer-based)

  • Architecture: Swin Transformer V2 with shifted windows
  • Depth: [6, 6, 6, 6, 6, 6] (6 stages, 6 blocks each)
  • Embedding Dim: 180
  • Parameters: ~28.6M
  • FLOPs: ~45.2G (for 128Γ—128 input)
  • Trained on: WorldStrat + ImageNet (pre-training)

Real-ESRGAN (GAN-based)

  • Generator: RRDBNet (Residual-in-Residual Dense Blocks)
  • Blocks: 23 RRDB blocks
  • Features: 64 base channels
  • Growth: 32 channels per dense layer
  • Parameters: ~16.7M
  • Loss: Combination of L1 + Perceptual (VGG) + GAN

Normalization Pipeline

# Sentinel-2 (Input LR)
def normalize_sentinel(img):
    # Raw: uint16 [0, 3000] for RGB bands
    # Normalized: float32 [0, 1]
    return np.clip(img / 3000.0, 0.0, 1.0)

# WorldView-3 (Target HR)
def normalize_worldview(img):
    # Raw: uint16 12-bit [0, 4095]
    # Normalized: float32 [0, 1]
    return np.clip(img / 4095.0, 0.0, 1.0)

πŸ“Š Performance & Results

Quantitative Metrics

Model Architecture Params Val PSNR Val SSIM Inference Time*
Swin2SR Transformer 28.6M 29.59 dB 0.8421 0.18s/img
Real-ESRGAN GAN (RRDB) 16.7M 29.12 dB 0.8392 0.14s/img
Ensemble Weighted Avg β€” 29.83 dB 0.8456 0.32s/img

*On NVIDIA T4 GPU, batch_size=1, 512Γ—512 output

Validation Results Breakdown

Dataset: 149 validation samples from WorldStrat
Regions: Urban (45%), Rural (35%), Coastal (20%)

Region Swin2SR ESRGAN Ensemble Ξ” Improvement
Urban 30.12 dB 29.45 dB 30.34 dB +0.22 dB
Rural 29.28 dB 28.93 dB 29.52 dB +0.24 dB
Coastal 29.01 dB 28.78 dB 29.28 dB +0.27 dB

Key Insight: Ensemble provides consistent improvement across all terrain types.


❓ FAQ

General Questions

Q: What is the expected PSNR on the test set?
A: Based on validation: 29.6-30.0 dB. Actual test performance depends on distribution similarity.

Q: Can I use only one model instead of ensemble?
A: Yes! The notebook automatically falls back to the best single model if ensemble validation fails. You can also manually set use_ensemble = False.

Q: How long does inference take?
A: On a T4 GPU with 149 test images: ~12-15 minutes. On CPU: ~2-3 hours (not recommended).

Technical Questions

Q: Why did my git push fail with "HTTP 408"?
A: You likely committed large .pth weight files. See CONTRIBUTING.md for using Git LFS or excluding weights.

Q: "CUDA out of memory" error?
A:

  1. Restart kernel: Kernel β†’ Restart
  2. Reduce batch size (already at optimal 1)
  3. Use smaller GPU (T4 instead of P100 if memory leak)
  4. Check for zombie processes: nvidia-smi

Q: Can I train these models from scratch?
A: Yes, see training notebooks in sample-model/. Note: Training requires ~20-30 hours on dual T4 GPUs.

Q: What if validation split is missing?
A: The code defaults to equal weighting [0.5, 0.5], which works well in practice. Ensemble will still improve over single models (~+0.15 dB instead of +0.25 dB).

Dataset Questions

Q: What bands does this use?
A: Sentinel-2: Bands [4, 3, 2] (Red, Green, Blue). WorldView-3: Bands [1, 2, 3] (RGB).

Q: Can I use this for other satellite datasets?
A: Yes, but you may need to adjust normalization ranges. See normalize_sentinel() and normalize_worldview() in the code.


🀝 Contributing

We welcome contributions from the community! Please see CONTRIBUTING.md for:

  • πŸ› Reporting bugs
  • πŸ’‘ Suggesting enhancements
  • πŸ”§ Submitting pull requests
  • πŸ“ Coding standards (PEP 8, type hints, docstrings)

Quick Contribution Checklist

  • Fork the repo and create a feature branch
  • Test locally on a subset of data
  • Run linting: flake8 *.py
  • Clear notebook outputs before committing
  • Write clear commit messages
  • Submit PR with description

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this code in your research, please cite:

@software{worldstrat_ensemble_2026,
  author = {Aditya26189},
  title = {WorldStrat Ensemble: Robust Satellite Image Super-Resolution},
  year = {2026},
  url = {https://github.com/Aditya26189/klymo}
}

πŸ”— Resources


Developed for the WorldStrat Challenge 🌍
Robust. Accurate. Scalable.


Built with ❀️ using PyTorch and HuggingFace Transformers

About

A production-grade ensemble system combining Transformer (Swin2SR) and GAN (Real-ESRGAN) architectures to achieve state-of-the-art 4x super-resolution for satellite imagery.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors