This is the code repository for our paper, “AU8: A Multimodal Benchmark Dataset from Eight Australian Cities for Urban Profiling and Analysis”. AU8 comprises eight major Australian cities including Greater Sydney, Greater Melbourne, Greater Brisbane, Greater Perth, Greater Adelaide, Greater Canberra, Greater Darwin, and Greater Hobart and contains 101,604 satellite image tiles, each paired with a declarative textual description and nine key urban indicators (e.g., population density, median income, housing price, land use). The dataset is available at https://huggingface.co/datasets/anonymous-for-review/AU8.
In AU8 we also provide comprehensive metadata for all Images, which are listed below.
| Attribute | Description |
|---|---|
| image_name | Image file name with latitude and longtitude |
| SA2_Code | ABS Statistical Area Level 2 (SA2) code |
| latitude | Latitude of the image centre |
| longitude | Longitude of the image centre |
| Median house price | Median price of established house transfers, 2023 (AUD) |
| Population density | Population density, 2023 (persons / km²) |
| Median income | Median total income excluding government pensions and allowances, 2020 (AUD) |
| No. of businesses | Total number of businesses, 2023 (mean) |
| protected land | Total protected land area, 2022 (ha, mean) |
| No. of jobs | Number of jobs, 2020 (mean) |
| Persons employed | Total persons employed aged 15 years and over, 2021 (mean) |
| Agricultural land | Area of agricultural land, 2021 (ha, mean) |
| Rural residential | Rural residential and farm infrastructure area, 2016 (ha, mean) |
| Description | Textual description of the image generated by GPT-5 |
Before running, make sure you have installed all required dependencies:
pip install -r requirements.txtEach method has a simple two-step (train → predict) or one-step pipeline.
Step 1: Train UrbanCLIP
python UrbanCLIP/main.pyStep 2: Predict with pretrained model
python UrbanCLIP/UrbanCLIP_predict.pyStep 1: Train GeoVit-HNM
python GeoVit-HNM/GeoVit-HNM.pyStep 2: Predict with trained model
python GeoVit-HNM/GeoVit-HNM_predict.pyStep 1: Train Tile2Vec embeddings
python tile2vec/tile2vec_triplet.pyStep 2: Predict urban indicators
python tile2vec/tile2vec_predict.pyRun PCA + XGBoost directly:
python pca/PCA.pyRun ResNet18 + XGBoost directly:
python resnet-18/resnet-18.py- Training scripts automatically save model checkpoints.
- Prediction scripts output evaluation metrics and prediction results.
- Make sure your dataset files are placed in the correct
data/directory.