Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
65 changes: 65 additions & 0 deletions DATASET_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,28 @@
This document provides a detailed overview of the datasets used in this repository. For each dataset, you will find instructions on how to prepare the data, along with command-line examples for running models.

*DISCLAIMER*: please consider that we provide the detailed overview for the datasets included in the original repo. Community-contributed datasets may not come with pre-defined command-line examples in this repository. Feel free to adapt the existing examples based on your use case.
## 📚 Table of Contents

- [HLSBurnScars](#hlsburnscars)
- [MADOS](#mados)
- [PASTIS-R](#pastis-r)
- [Sen1Floods11](#sen1floods11)
- [xView2](#xview2)
- [FiveBillionPixels](#fivebillionpixels)
- [DynamicEarthNet](#dynamicearthnet)
- [Crop Type Mapping (South Sudan)](#crop-type-mapping-south-sudan)
- [SpaceNet 7](#spacenet-7)
- [AI4SmallFarms](#ai4smallfarms)
- [BioMassters](#biomassters)

### 🧪 Community-Contributed Datasets
- [Potsdam](#potsdam)
- [Geo-Bench Datasets](#geo-bench-datasets)
- [Multi-label Classification (e.g., m-BigEarthNet)](#for-multi-label-classification-eg-m-bigearthnet)
- [Single-label Classification (e.g., m-EuroSat, m-Brick-Kiln)](#for-single-label-classification-ie-m-eurosat-m-brick-kiln-m-forestnet-m-pv4ger-m-so2sat)
- [Semantic Segmentation (e.g., m-NZ-Cattle, m-SA-Crop-Type)](#for-semantic-segmentation-ie-m-cashew-plantation-m-chesapeake-landcover-m-neontree-m-nz-cattle-m-pv4ger-seg-and-m-sa-crop-type)

---

### HLSBurnScars

Expand Down Expand Up @@ -222,6 +244,7 @@ This document provides a detailed overview of the datasets used in this reposito
```
In this case, you can specify in the `temp` parameter which frame you want to use.

---
**Note**: The following datasets are **community-contributed** and are not part of the original benchmark repository.
### Potsdam
```
Expand All @@ -234,3 +257,45 @@ This document provides a detailed overview of the datasets used in this reposito
criterion=cross_entropy \
task=segmentation
```
### Geo-Bench Datasets
- For multi-label classification, e.g., m-BigEarthNet
```
export GEO_BENCH_DIR=YOUR/PATH/DIR
torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py \
--config-name=train \
dataset=mbigearthnet \
encoder=dofa \
decoder=cls_linear \
preprocessing=cls_resize \
criterion=binary_cross_entropy \
task=classification_multi_label \
finetune=false
```

- For single-label classification, i.e., m-EuroSat, m-Brick-Kiln, m-ForestNet, m-PV4Ger, m-So2Sat
```
export GEO_BENCH_DIR=YOUR/PATH/DIR
torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py \
--config-name=train \
dataset=meurosat \
encoder=dofa \
decoder=cls_linear \
preprocessing=cls_resize \
criterion=cross_entropy \
task=classification \
finetune=false
```

- For semantic segmentation, i.e., m-Cashew-Plantation, m-Chesapeake-Landcover, m-NeonTree, m-NZ-Cattle, m-PV4Ger-Seg and m-SA-Crop-Type
```
export GEO_BENCH_DIR=YOUR/PATH/DIR
torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py \
--config-name=train \
dataset=mnz-cattle \
encoder=dofa \
decoder=seg_upernet \
preprocessing=seg_default \
criterion=cross_entropy \
task=segmentation \
finetune=false
```
13 changes: 9 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,8 @@


📢 **News**
- [23/04/2025] we pushed a new version of the code, fixing different bugs (e.g. commands are working for all the datasets now, metric computation with ignore_index is fixed, etc...). In the next month, we will provide: all downloadable datasets and models, downloadable stratified subsamples for all the datasets, classification. Stay tuned!
- [04/06/2025] We integrate [Geo-Bench](https://arxiv.org/abs/2306.03831) Datasets, including six segmentation and six classification tasks.
<!-- - [23/04/2025] we pushed a new version of the code, fixing different bugs (e.g. commands are working for all the datasets now, metric computation with ignore_index is fixed, etc...). In the next month, we will provide: all downloadable datasets and models, downloadable stratified subsamples for all the datasets, classification. Stay tuned! -->
- [22/04/2025] on EarthDay, PANGAEA was officialy adopted to benchmark TerraMind. Read the [news](https://www.linkedin.com/posts/simonetta-cheli-7669879b_earthday-earthobservation-activity-7320439907028467712-LSzl?utm_source=share&utm_medium=member_desktop&rcm=ACoAACdT8q0BDNWYKAdDYGUe_X4fQOzSHO8jgAs) and the [pre-print](https://arxiv.org/abs/2504.11171). We will release the benchmarking code in PANGAEA very soon!
- [05/12/2024] the [pre-print](https://arxiv.org/abs/2412.04204) is out!

Expand Down Expand Up @@ -84,6 +85,7 @@ And the following **datasets**:

**Note**: The following datasets are **community-contributed** and are not part of the original benchmark repository. We are grateful for these contributions, which help enrich the benchmark's diversity and applicability.
- **Potsdam dataset** [[Link](https://www.isprs.org/education/benchmarks/UrbanSemLab/2d-sem-label-potsdam.aspx)]. Contributed by [@pierreadorni](https://github.com/pierreadorni).
- **Geo-Bench datasets** [[Link](https://github.com/ServiceNow/geo-bench)]. Contributed by [@yurujaja](https://github.com/yurujaja).

The repository supports the following **tasks** using geospatial (foundation) models:
- [Single Temporal Semantic Segmentation](#single-temporal-semantic-segmentation)
Expand Down Expand Up @@ -331,11 +333,11 @@ torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py \

### Using Your Own Dataset

Refer to: [Adding a new downstream dataset](.github/CONTRIBUTING.md#adding-a-new-downstream-dataset)
Refer to: [Adding a new downstream dataset](CONTRIBUTING.md#adding-a-new-downstream-dataset)

### Using Your Own Model

Refer to: [Adding a new geospatial foundation model](.github/CONTRIBUTING.md#adding-a-new-geospatial-foundation-model)
Refer to: [Adding a new geospatial foundation model](CONTRIBUTING.md#adding-a-new-geospatial-foundation-model)

## 🏃 Evaluation

Expand All @@ -348,7 +350,7 @@ torchrun pangaea/run.py --config-name=test ckpt_dir=path_to_ckpt_dir
```

## ✏️ Contributing
We appreciate all contributions. Please refer to [Contributing Guidelines](.github/CONTRIBUTING.md).
We appreciate all contributions. Please refer to [Contributing Guidelines](CONTRIBUTING.md).

## ⚠️ TO DO

Expand Down Expand Up @@ -380,3 +382,6 @@ If you find this work useful, please cite:
url={https://arxiv.org/abs/2412.04204},
}
```
## Acknowledge

The computations/data handling were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725.
1 change: 1 addition & 0 deletions configs/criterion/binary_cross_entropy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
_target_: torch.nn.BCEWithLogitsLoss
2 changes: 2 additions & 0 deletions configs/criterion/none.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# returns its input unchanged – produces a 0-parameter nn.Identity module
_target_: torch.nn.Identity
2 changes: 1 addition & 1 deletion configs/dataset/fivebillionpixels.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
_target_: pangaea.datasets.fivebillionpixels.FiveBillionPixels
dataset_name: FiveBillionPixels
root_path: /geomatics/gpuserver-1/vmarsocci/FiveBillionPixels/cropped
root_path: /mimer/NOBACKUP/groups/naiss2024-22-857/datasets/Five-Billion-Pixels/cropped/new
download_url: False
auto_download: False
use_cmyk: False
Expand Down
40 changes: 40 additions & 0 deletions configs/dataset/mbigearthnet.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
_target_: pangaea.datasets.geobench.mbigearthnet.mBigEarthNet
dataset_name: mBigEarthNet
root_path: ${oc.env:GEO_BENCH_DIR}/classification_v1.0/m-bigearthnet # ensure sys env var GEO_BENCH_DIR exist
download_url: "recursix/geo-bench-1.0"
auto_download: True
ignore_index: -100
num_classes: 43
img_size: 120
multi_temporal: False
multi_modal: False

bands:
optical:
- B1
- B2
- B3
- B4
- B5
- B6
- B7
- B8
- B8A
- B9
- B11
- B12

classes: ['']
distribution: [0,]

# data stats
data_mean:
optical: [378.4027, 482.2730, 706.5345, 720.9285, 1100.6688, 1909.2914, 2191.6985, 2336.8706, 2394.7449, 2368.3127, 1875.2487, 1229.3818]

data_std:
optical: [157.5666, 255.0429, 303.1750, 391.2943, 380.7916, 551.6558, 638.8196, 744.2009, 675.4041, 561.0154, 563.4095, 479.1786]

data_min:
optical: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
data_max:
optical: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
39 changes: 39 additions & 0 deletions configs/dataset/mbrickkiln.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
_target_: pangaea.datasets.geobench.mbrickkiln.mBrickKiln
dataset_name: mBrickKiln
root_path: ${oc.env:GEO_BENCH_DIR}/classification_v1.0/m-brick-kiln
download_url: "recursix/geo-bench-1.0"
auto_download: True

num_classes: 2
img_size: 64
multi_temporal: False
multi_modal: False

ignore_index: -100
classes: ['not brick kiln', 'brick kiln']
distribution: [0, 0]
bands:
optical:
- B1
- B2
- B3
- B4
- B5
- B6
- B7
- B8
- B8A
- B9
- B10
- B11
- B12

data_mean:
optical: [574.7587880700896, 674.3473615470523, 886.3656479311578, 815.0945462528913, 1128.8088426870465, 1934.450471876027, 2045.7652282437202, 2012.744587807115, 1608.6255233989034, 1129.8171906000355, 83.27188605598549, 90.54924599052214, 68.98768652434848]
data_std:
optical: [193.60631504991184, 238.75447480113132, 276.9631260242207, 361.15060137326634, 364.5888078793488, 724.2707123576525, 819.653063972575, 794.3652427593881, 800.8538290702304, 704.0219637458916, 36.355745901131705, 28.004671947623894, 24.268892726362033]

data_min:
optical: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
data_max:
optical: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
55 changes: 55 additions & 0 deletions configs/dataset/mcashew-plantation.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
_target_: pangaea.datasets.geobench.mcashew-plantation.mCashewPlant
dataset_name: mCashew-Plant
root_path: ${oc.env:GEO_BENCH_DIR}/segmentation_v1.0/m-cashew-plant
download_url: "recursix/geo-bench-1.0"
auto_download: True

img_size: 256
multi_temporal: False
multi_modal: False

# classes
ignore_index: 255
num_classes: 7
classes:
- 'no data'
- 'well-managed plantation'
- 'poorly-managed plantation'
- 'non-plantation'
- 'residential'
- 'background'
- 'uncertain'
distribution:
- 0
- 0
- 0
- 0
- 0
- 0
- 0


bands:
optical:
- B1
- B2
- B3
- B4
- B5
- B6
- B7
- B8
- B8A
- B9
- B11
- B12

data_mean:
optical: [520.1185302734375, 634.7583618164062, 892.461181640625, 880.7075805664062, 1380.6409912109375, 2233.432373046875, 2549.379638671875, 2643.248046875, 2643.531982421875, 2852.87451171875, 2463.933349609375, 1600.9207763671875]
data_std:
optical: [204.2023468017578, 227.25344848632812, 222.32545471191406, 350.47235107421875, 280.6436767578125, 373.7521057128906, 449.9236145019531, 414.6498107910156, 415.1019592285156, 413.8980407714844, 494.97430419921875, 514.4229736328125]

data_min:
optical: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
data_max:
optical: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
49 changes: 49 additions & 0 deletions configs/dataset/mchesapeake-landcover.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
_target_: pangaea.datasets.geobench.mchesapeake-landcover.mChesapeake
dataset_name: mChesapeake
root_path: ${oc.env:GEO_BENCH_DIR}/segmentation_v1.0/m-chesapeake
download_url: "recursix/geo-bench-1.0"
auto_download: True

img_size: 256
multi_temporal: False
multi_modal: False

# classes
ignore_index: -1
num_classes: 7
classes:
- 'water'
- 'tree-canopy-forest'
- 'low-vegetation-field'
- 'barren-land'
- 'impervious-other'
- 'impervious-roads'
- 'no data'
distribution:
- 0
- 0
- 0
- 0
- 0
- 0
- 0

# data stats
bands:
optical:
- B2
- B3
- B4
- B8

data_mean:
optical: [0.4807923436164856, 0.5200885534286499, 0.4570387601852417,0.569856584072113]

data_std:
optical: [0.17441707849502563, 0.1976749747991562, 0.21191735565662384, 0.2831788957118988]


data_min:
optical: [0.0000, 0.0, 0.0, 0.0]
data_max:
optical: [0.0000, 0.0, 0.0, 0.0]
40 changes: 40 additions & 0 deletions configs/dataset/meurosat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
_target_: pangaea.datasets.geobench.meurosat.mEuroSat
dataset_name: mEuroSat
root_path: ${oc.env:GEO_BENCH_DIR}/classification_v1.0/m-eurosat # ensure sys env var GEO_BENCH_DIR exist
download_url: "recursix/geo-bench-1.0"
auto_download: True
ignore_index: -100
multi_temporal: False
multi_modal: False
img_size: 64
num_classes: 10

bands:
optical:
- B1
- B2
- B3
- B4
- B5
- B6
- B7
- B8
- B8A
- B9
- B10
- B11
- B12

classes: ['AnnualCrop', 'Forest', 'HerbaceousVegetation', 'Highway', 'Industrial', 'Pasture', 'PermanentCrop', 'Residential', 'River', 'SeaLake']
distribution: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


data_mean:
optical: [1355.5426, 1113.8855, 1035.7394, 928.2619, 1188.2629, 2032.7325, 2416.5286, 2342.5396, 748.9036, 12.0419, 1810.1284, 1101.3801, 2644.5996]
data_std:
optical: [68.9288, 160.0012, 194.6687, 286.8012, 236.6991, 372.3853, 478.1329, 556.7527, 102.5583, 1.2167, 392.9388, 313.7339, 526.7788]

data_min:
optical: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
data_max:
optical: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Loading
Loading