VMarsocci · yurujaja · Mar 18, 2025 · Mar 18, 2025 · May 23, 2025 · May 23, 2025
diff --git a/.github/CONTRIBUTING.md → CONTRIBUTING.md b/.github/CONTRIBUTING.md → CONTRIBUTING.md
diff --git a/DATASET_GUIDE.md b/DATASET_GUIDE.md
@@ -3,6 +3,28 @@
 This document provides a detailed overview of the datasets used in this repository. For each dataset, you will find instructions on how to prepare the data, along with command-line examples for running models. 
 
 *DISCLAIMER*: please consider that we provide the detailed overview for the datasets included in the original repo. Community-contributed datasets may not come with pre-defined command-line examples in this repository. Feel free to adapt the existing examples based on your use case. 
+## 📚 Table of Contents
+
+- [HLSBurnScars](#hlsburnscars)
+- [MADOS](#mados)
+- [PASTIS-R](#pastis-r)
+- [Sen1Floods11](#sen1floods11)
+- [xView2](#xview2)
+- [FiveBillionPixels](#fivebillionpixels)
+- [DynamicEarthNet](#dynamicearthnet)
+- [Crop Type Mapping (South Sudan)](#crop-type-mapping-south-sudan)
+- [SpaceNet 7](#spacenet-7)
+- [AI4SmallFarms](#ai4smallfarms)
+- [BioMassters](#biomassters)
+
+### 🧪 Community-Contributed Datasets
+- [Potsdam](#potsdam)
+- [Geo-Bench Datasets](#geo-bench-datasets)
+  - [Multi-label Classification (e.g., m-BigEarthNet)](#for-multi-label-classification-eg-m-bigearthnet)
+  - [Single-label Classification (e.g., m-EuroSat, m-Brick-Kiln)](#for-single-label-classification-ie-m-eurosat-m-brick-kiln-m-forestnet-m-pv4ger-m-so2sat)
+  - [Semantic Segmentation (e.g., m-NZ-Cattle, m-SA-Crop-Type)](#for-semantic-segmentation-ie-m-cashew-plantation-m-chesapeake-landcover-m-neontree-m-nz-cattle-m-pv4ger-seg-and-m-sa-crop-type)
+
+---
 
 ### HLSBurnScars
 
@@ -222,6 +244,7 @@ This document provides a detailed overview of the datasets used in this reposito
    ```
   In this case, you can specify in the `temp` parameter which frame you want to use.
 
+---
 **Note**: The following datasets are **community-contributed** and are not part of the original benchmark repository. 
 ### Potsdam
    ```
@@ -234,3 +257,45 @@ This document provides a detailed overview of the datasets used in this reposito
    criterion=cross_entropy \
    task=segmentation
   ```
+### Geo-Bench Datasets 
+-  For multi-label classification, e.g., m-BigEarthNet
+    ```
+    export GEO_BENCH_DIR=YOUR/PATH/DIR
+    torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py  \
+      --config-name=train \
+      dataset=mbigearthnet \
+      encoder=dofa  \
+      decoder=cls_linear  \
+      preprocessing=cls_resize \
+      criterion=binary_cross_entropy \
+      task=classification_multi_label \
+      finetune=false
+    ```
+
+-  For single-label classification, i.e., m-EuroSat, m-Brick-Kiln, m-ForestNet, m-PV4Ger, m-So2Sat
+    ```
+      export GEO_BENCH_DIR=YOUR/PATH/DIR
+      torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py  \
+        --config-name=train \
+        dataset=meurosat \
+        encoder=dofa  \
+        decoder=cls_linear  \
+        preprocessing=cls_resize \
+        criterion=cross_entropy \
+        task=classification \
+        finetune=false
+      ```
+
+-  For semantic segmentation, i.e., m-Cashew-Plantation, m-Chesapeake-Landcover, m-NeonTree, m-NZ-Cattle, m-PV4Ger-Seg and m-SA-Crop-Type
+    ```
+      export GEO_BENCH_DIR=YOUR/PATH/DIR
+      torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py  \
+        --config-name=train \
+        dataset=mnz-cattle \
+        encoder=dofa  \
+        decoder=seg_upernet  \
+        preprocessing=seg_default \
+        criterion=cross_entropy \
+        task=segmentation \
+        finetune=false
+      ```
diff --git a/README.md b/README.md
@@ -35,7 +35,8 @@
 
 
 📢 **News**
- - [23/04/2025] we pushed a new version of the code, fixing different bugs (e.g. commands are working for all the datasets now, metric computation with ignore_index is fixed, etc...). In the next month, we will provide: all downloadable datasets and models, downloadable stratified subsamples for all the datasets, classification. Stay tuned!
+ - [04/06/2025] We integrate [Geo-Bench](https://arxiv.org/abs/2306.03831) Datasets, including six segmentation and six classification tasks.
+ <!-- - [23/04/2025] we pushed a new version of the code, fixing different bugs (e.g. commands are working for all the datasets now, metric computation with ignore_index is fixed, etc...). In the next month, we will provide: all downloadable datasets and models, downloadable stratified subsamples for all the datasets, classification. Stay tuned! -->
  - [22/04/2025] on EarthDay, PANGAEA was officialy adopted to benchmark TerraMind. Read the [news](https://www.linkedin.com/posts/simonetta-cheli-7669879b_earthday-earthobservation-activity-7320439907028467712-LSzl?utm_source=share&utm_medium=member_desktop&rcm=ACoAACdT8q0BDNWYKAdDYGUe_X4fQOzSHO8jgAs) and the [pre-print](https://arxiv.org/abs/2504.11171). We will release the benchmarking code in PANGAEA very soon!
  - [05/12/2024] the [pre-print](https://arxiv.org/abs/2412.04204) is out!
 
@@ -84,6 +85,7 @@ And the following **datasets**:
 
 **Note**: The following datasets are **community-contributed** and are not part of the original benchmark repository. We are grateful for these contributions, which help enrich the benchmark's diversity and applicability.
 - **Potsdam dataset** [[Link](https://www.isprs.org/education/benchmarks/UrbanSemLab/2d-sem-label-potsdam.aspx)]. Contributed by [@pierreadorni](https://github.com/pierreadorni).
+- **Geo-Bench datasets** [[Link](https://github.com/ServiceNow/geo-bench)]. Contributed by [@yurujaja](https://github.com/yurujaja).
 
 The repository supports the following **tasks** using geospatial (foundation) models:
  - [Single Temporal Semantic Segmentation](#single-temporal-semantic-segmentation)
@@ -331,11 +333,11 @@ torchrun --nnodes=1 --nproc_per_node=1 pangaea/run.py \
 
 ### Using Your Own Dataset
 
-Refer to: [Adding a new downstream dataset](.github/CONTRIBUTING.md#adding-a-new-downstream-dataset)
+Refer to: [Adding a new downstream dataset](CONTRIBUTING.md#adding-a-new-downstream-dataset)
 
 ### Using Your Own Model
 
-Refer to: [Adding a new geospatial foundation model](.github/CONTRIBUTING.md#adding-a-new-geospatial-foundation-model)
+Refer to: [Adding a new geospatial foundation model](CONTRIBUTING.md#adding-a-new-geospatial-foundation-model)
 
 ## 🏃 Evaluation 
 
@@ -348,7 +350,7 @@ torchrun pangaea/run.py --config-name=test ckpt_dir=path_to_ckpt_dir
 ```
 
 ## ✏️ Contributing
-We appreciate all contributions. Please refer to [Contributing Guidelines](.github/CONTRIBUTING.md).
+We appreciate all contributions. Please refer to [Contributing Guidelines](CONTRIBUTING.md).
 
 ## ⚠️ TO DO
 
@@ -380,3 +382,6 @@ If you find this work useful, please cite:
       url={https://arxiv.org/abs/2412.04204}, 
 }
 ```
+##  Acknowledge
+
+The computations/data handling were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725.
diff --git a/configs/criterion/binary_cross_entropy.yaml b/configs/criterion/binary_cross_entropy.yaml
@@ -0,0 +1 @@
+_target_: torch.nn.BCEWithLogitsLoss
diff --git a/configs/criterion/none.yaml b/configs/criterion/none.yaml
@@ -0,0 +1,2 @@
+# returns its input unchanged – produces a 0-parameter nn.Identity module
+_target_: torch.nn.Identity
diff --git a/configs/dataset/fivebillionpixels.yaml b/configs/dataset/fivebillionpixels.yaml
@@ -1,6 +1,6 @@
 _target_: pangaea.datasets.fivebillionpixels.FiveBillionPixels
 dataset_name: FiveBillionPixels
-root_path: /geomatics/gpuserver-1/vmarsocci/FiveBillionPixels/cropped
+root_path: /mimer/NOBACKUP/groups/naiss2024-22-857/datasets/Five-Billion-Pixels/cropped/new
 download_url: False 
 auto_download: False
 use_cmyk: False

diff --git a/configs/dataset/mbigearthnet.yaml b/configs/dataset/mbigearthnet.yaml
@@ -0,0 +1,40 @@
+_target_: pangaea.datasets.geobench.mbigearthnet.mBigEarthNet
+dataset_name: mBigEarthNet
+root_path: ${oc.env:GEO_BENCH_DIR}/classification_v1.0/m-bigearthnet   # ensure sys env var GEO_BENCH_DIR exist
+download_url: "recursix/geo-bench-1.0"
+auto_download: True
+ignore_index: -100
+num_classes: 43
+img_size: 120
+multi_temporal: False
+multi_modal: False
+
+bands:
+  optical:
+    - B1
+    - B2
+    - B3
+    - B4
+    - B5
+    - B6
+    - B7
+    - B8
+    - B8A
+    - B9
+    - B11
+    - B12
+
+classes:  ['']
+distribution: [0,]
+
+# data stats
+data_mean:
+  optical: [378.4027, 482.2730, 706.5345, 720.9285, 1100.6688, 1909.2914, 2191.6985, 2336.8706, 2394.7449, 2368.3127, 1875.2487, 1229.3818]
+
+data_std:
+  optical: [157.5666, 255.0429, 303.1750, 391.2943, 380.7916, 551.6558, 638.8196, 744.2009, 675.4041, 561.0154, 563.4095, 479.1786]
+
+data_min:
+  optical: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
+data_max:
+  optical: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
diff --git a/configs/dataset/mbrickkiln.yaml b/configs/dataset/mbrickkiln.yaml
@@ -0,0 +1,39 @@
+_target_: pangaea.datasets.geobench.mbrickkiln.mBrickKiln
+dataset_name: mBrickKiln
+root_path: ${oc.env:GEO_BENCH_DIR}/classification_v1.0/m-brick-kiln
+download_url: "recursix/geo-bench-1.0"
+auto_download: True
+
+num_classes: 2
+img_size: 64
+multi_temporal: False
+multi_modal: False
+
+ignore_index: -100
+classes:  ['not brick kiln', 'brick kiln']
+distribution: [0, 0]
+bands:
+  optical:
+    - B1
+    - B2
+    - B3
+    - B4
+    - B5
+    - B6
+    - B7
+    - B8
+    - B8A
+    - B9
+    - B10
+    - B11
+    - B12
+
+data_mean:
+  optical: [574.7587880700896, 674.3473615470523, 886.3656479311578, 815.0945462528913, 1128.8088426870465, 1934.450471876027, 2045.7652282437202, 2012.744587807115, 1608.6255233989034, 1129.8171906000355, 83.27188605598549, 90.54924599052214, 68.98768652434848]
+data_std:
+  optical: [193.60631504991184, 238.75447480113132, 276.9631260242207, 361.15060137326634, 364.5888078793488, 724.2707123576525, 819.653063972575, 794.3652427593881, 800.8538290702304, 704.0219637458916, 36.355745901131705, 28.004671947623894, 24.268892726362033]
+
+data_min:
+  optical: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
+data_max:
+  optical: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
diff --git a/configs/dataset/mcashew-plantation.yaml b/configs/dataset/mcashew-plantation.yaml
@@ -0,0 +1,55 @@
+_target_: pangaea.datasets.geobench.mcashew-plantation.mCashewPlant
+dataset_name: mCashew-Plant
+root_path: ${oc.env:GEO_BENCH_DIR}/segmentation_v1.0/m-cashew-plant
+download_url: "recursix/geo-bench-1.0"
+auto_download: True
+
+img_size: 256
+multi_temporal: False
+multi_modal: False
+
+# classes
+ignore_index: 255
+num_classes: 7
+classes: 
+  - 'no data'
+  - 'well-managed plantation'
+  - 'poorly-managed plantation'
+  - 'non-plantation'
+  - 'residential'
+  - 'background'
+  - 'uncertain'
+distribution:
+  - 0
+  - 0
+  - 0
+  - 0
+  - 0
+  - 0
+  - 0
+
+
+bands:
+  optical:
+    - B1
+    - B2
+    - B3
+    - B4
+    - B5
+    - B6
+    - B7
+    - B8
+    - B8A
+    - B9
+    - B11
+    - B12
+
+data_mean:
+  optical: [520.1185302734375, 634.7583618164062, 892.461181640625, 880.7075805664062, 1380.6409912109375, 2233.432373046875, 2549.379638671875, 2643.248046875, 2643.531982421875, 2852.87451171875, 2463.933349609375, 1600.9207763671875]
+data_std:
+  optical: [204.2023468017578, 227.25344848632812, 222.32545471191406, 350.47235107421875, 280.6436767578125, 373.7521057128906, 449.9236145019531, 414.6498107910156, 415.1019592285156, 413.8980407714844, 494.97430419921875, 514.4229736328125]
+
+data_min:
+  optical: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
+data_max:
+  optical: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
diff --git a/configs/dataset/mchesapeake-landcover.yaml b/configs/dataset/mchesapeake-landcover.yaml
@@ -0,0 +1,49 @@
+_target_: pangaea.datasets.geobench.mchesapeake-landcover.mChesapeake
+dataset_name: mChesapeake
+root_path: ${oc.env:GEO_BENCH_DIR}/segmentation_v1.0/m-chesapeake
+download_url: "recursix/geo-bench-1.0"
+auto_download: True
+
+img_size: 256
+multi_temporal: False
+multi_modal: False
+
+# classes
+ignore_index: -1
+num_classes: 7
+classes: 
+  - 'water'
+  - 'tree-canopy-forest'
+  - 'low-vegetation-field'
+  - 'barren-land'
+  - 'impervious-other'
+  - 'impervious-roads'
+  - 'no data'
+distribution:
+  - 0
+  - 0
+  - 0
+  - 0
+  - 0
+  - 0
+  - 0
+
+# data stats
+bands:
+  optical:
+    - B2
+    - B3
+    - B4
+    - B8
+
+data_mean:
+  optical: [0.4807923436164856, 0.5200885534286499, 0.4570387601852417,0.569856584072113]
+
+data_std:
+  optical: [0.17441707849502563, 0.1976749747991562, 0.21191735565662384, 0.2831788957118988]
+
+
+data_min:
+  optical: [0.0000, 0.0, 0.0, 0.0]
+data_max:
+  optical: [0.0000, 0.0, 0.0, 0.0]
diff --git a/configs/dataset/meurosat.yaml b/configs/dataset/meurosat.yaml
@@ -0,0 +1,40 @@
+_target_: pangaea.datasets.geobench.meurosat.mEuroSat  
+dataset_name: mEuroSat
+root_path: ${oc.env:GEO_BENCH_DIR}/classification_v1.0/m-eurosat  # ensure sys env var GEO_BENCH_DIR exist
+download_url: "recursix/geo-bench-1.0"
+auto_download: True
+ignore_index: -100
+multi_temporal: False
+multi_modal: False
+img_size: 64
+num_classes: 10
+
+bands:
+  optical:
+    - B1
+    - B2
+    - B3
+    - B4
+    - B5
+    - B6
+    - B7
+    - B8
+    - B8A
+    - B9
+    - B10
+    - B11
+    - B12
+
+classes: ['AnnualCrop', 'Forest', 'HerbaceousVegetation', 'Highway', 'Industrial', 'Pasture', 'PermanentCrop', 'Residential', 'River', 'SeaLake']
+distribution: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
+
+
+data_mean:
+  optical: [1355.5426, 1113.8855, 1035.7394, 928.2619, 1188.2629, 2032.7325, 2416.5286, 2342.5396, 748.9036, 12.0419, 1810.1284, 1101.3801, 2644.5996]
+data_std:
+  optical: [68.9288, 160.0012, 194.6687, 286.8012, 236.6991, 372.3853, 478.1329, 556.7527, 102.5583, 1.2167, 392.9388, 313.7339, 526.7788]
+
+data_min:
+  optical: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
+data_max:
+  optical: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		# returns its input unchanged – produces a 0-parameter nn.Identity module
		_target_: torch.nn.Identity