Skip to content

MiliLab/GeoZero

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

GeoZero: Incentivizing Reasoning from Scratch on Geospatial Scenes

Di Wang1, Shunyu Liu2, Wentao Jiang1, Fengxiang Wang3, Yi Liu1, Xiaolei Qin1, Zhiming Luo1,

Chaoyang Zhou1, Haonan Guo1, Jing Zhang1 †, Bo Du1 †, Dacheng Tao2, Liangpei Zhang1 †

1 Wuhan University, 2 Nanyang Technological University, 3 Shanghai AI Laboratory.

Corresponding author

Update | Overview | Datasets | Models | Usage | Statement

🔥 Update

2025.12.04

  • All components required for building an inference demo have been prepared.
  • The updated model weights are available on:
  • The JSON annotation files for the test sets of several benchmarks used in our evaluation have been released and are available at:
    • Hugging Face
      (Note: Only the JSON files are provided; the corresponding images should be downloaded from the original datasets.)

2025.12.01

  • The paper is post on arXiv! (arXiv)

🌞 Overview

We present GeoZero, the first MLLM capable of performing emergent reasoning on geospatial scenes from scratch without any predefined CoT supervision. To encourage deep and reliable reasoning while maintaining answer accuracy, we construct two datasets, GeoZero-Instruct and GeoZero-Hard. GeoZero-Instruct allows the model to acquire preliminary geospatial knowledge through supervised fine-tuning, while GeoZero-Hard stimulates deep reasoning during the subsequent reinforcement learning stage. We also propose Answer-Anchored Group Relative Policy Optimization (A$^2$GRPO), where the reasoning process is regularized by the model’s own answers, encouraging diverse yet accurate thinking. GeoZero not only reduces annotation costs but also enhances the cognitive capability of MLLMs, offering new insights toward general geospatial AI.

Figure 1. Framework of GeoZero.


📖 Datasets

GeoZero relies on multiple remote sensing benchmarks for both model development and evaluation. Please manually download the corresponding image datasets from their original sources.

🔗 Recommended Data Sources

Dataset Dataset Dataset
VHM-Instruct RESISC-45 EuroSAT
AID NASC-TG2 fMoW
WHU-RS19 RSVQA UCM
RSVG DIOR-RSVG SkyEye-968k
VRSBench SIRI-WHU UCM-Captions
Sydney-Captions NWPU-Captions RSICD

We provide pre-formatted JSON annotation files to ensure consistent data loading and usage:

Training data

Coming Soon.

Evaluation data

Evaluation samples across different benchmarks are available on our continually updated Hugging Face dataset repository:

👉 GeoZero_Eval_Datasets

🚀 Models

Model Weights
GeoZero w/o RFT Hugging Face & Baidu Drive

🔨 Usage

Training

Wait for update.

Inference

We provide an inference script for Qwen3-VL and related models on various remote sensing vision–language tasks:

python single_infer_eval_qwen3vl_think.py \
--model_path [model path] \
--json_path [dataset json path] \
--output_path [output saved path] \
--task [task type] --batchsize 4 --gpu [gpu id] --system [whether use the system prompt (Type1)]

🍭 Results

⭐ Citation

If you find GeoZero helpful, please give a ⭐ and cite it as follows:

@article{wang2025geozero,
  title   = {GeoZero: Incentivizing Reasoning from Scratch on Geospatial Scenes},
  author  = {Wang, Di and Liu, Shunyu and Jiang, Wentao and Wang, Fengxiang and Liu, Yi and Qin, Xiaolei and Luo, Zhiming and Zhou, Chaoyang and Guo, Haonan and Zhang, Jing and Du, Bo and Tao, Dacheng and Zhang, Liangpei},
  journal = {arXiv preprint arXiv:2511.22645},
  year    = {2025}
}

🎺 Statement

For any other questions please contact di.wang at gmail.com or whu.edu.cn.

💖 Thanks

This project is based on Qwen3-VL, ms-swift, RSEvalKit, Thanks for their wonderful work!

About

Official repo for "GeoZero: Incentivizing Reasoning from Scratch on Geospatial Scenes"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages