STH-SepNet: Decoupling Spatio-Temporal Prediction: When Lightweight Large Models Meet Adaptive Hypergraphs
Welcome to STH-SepNet's GitHub repository! This repository hosts the code, data and model weight of STH-SepNet (KDD'25 Research Track).
Abstract: Spatio-temporal prediction is a pivotal task with broad applications in traffic management, climate monitoring, and energy scheduling. However, existing methodologies often struggle to balance model expressiveness and computational efficiency, especially when scaling to large real-world datasets. To tackle these challenges, we propose STH-SepNet (Spatio-Temporal Hypergraph Separation Network), a novel framework that decouples temporal and spatial modeling to enhance both efficiency and precision. Therein, the temporal dimension is modeled using lightweight large language models, which effectively capture low-rank temporal dynamics. Concurrently, the spatial dimension is addressed through an adaptive hypergraph neural network, which dynamically constructs hyperedges to model intricate, higher-order interactions. A carefully designed gating mechanism is integrated to seamlessly fuse temporal and spatial representations. By leveraging the fundamental principles of low-rank temporal dynamics and spatial interactions, STH-SepNet offers a pragmatic and scalable solution for spatio-temporal prediction in real-world applications. Extensive experiments on large-scale real-world datasets across multiple benchmarks demonstrate the effectiveness of STH-SepNet in improving predictive performance while maintaining computational efficiency. This work may provide a promising lightweight framework for spatio-temporal prediction, aiming to reduce computational demands and while enhancing predictive performance.
[Paper Page] [中文解读]
If you find this repository helpful for your research, please cite our paper.
@inproceedings{chen2025decoupling,
title={Decoupling Spatio-Temporal Prediction: When Lightweight Large Models Meet Adaptive Hypergraphs},
author={Chen, Jiawen and Shao, Qi and Chen, Duxin and Yu, Wwenwu},
booktitle={Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)},
year={2025},
month={August 3rd-7th},
address={Toronto, Canada},
publisher={ACM}
}
The lightweight training requires torch 2.0+, to install all dependencies , update corresponding libraries:
pip install -r requirements.txtThe data can be obtained and downloaded from (Google Drive), and makedir path dataset/ and put dataset in dataset/.
The pretrained models can be downloaded from the links in the Table as below, and makedir path huggingface/ and put pretrained models in huggingface/. For example, huggingface/BERT
| Model 🤗 | Parameters | LLM Dimension | Model Description |
|---|---|---|---|
| BERT | 110M | 768 | A Transformer-based pre-trained model for NLP tasks, excelling in sentence classification and question answering. |
| GPT-2 | 124M | 768 | A Transformer-based generative model, specialized in text generation and language modeling. |
| GPT-3 | 7580M | 4096 | A large-scale Transformer-based generative model supporting various language tasks. |
| LLAMA-1B | 1230M | 2048 | A multilingual model developed by Meta, designed for dialogue and knowledge retrieval tasks. |
| LLAMA-7B | 6740M | 4096 | A multilingual model developed by Meta, suitable for various natural language generation tasks. |
| LLAMA-8B | 8000M | 4096 | A multilingual model developed by Meta, focused on dialogue and instruction-tuning tasks. |
| DeepSeek-Qwen1.5B | 1500M | 1536 | A reasoning-focused model enhanced through reinforcement learning for improved reasoning capabilities. |
| Parameter | Type | Description | Default Value |
|---|---|---|---|
model |
string | Name of the model, among: - pool: SHT-SepNet model with adaptive hypergraphs module - Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting (NeurIPS 2021) - TIMELLM: Time Series Forecasting by Reprogramming Large Language Models (ICLR 2024) |
pool |
dataset |
string | Name of the dataset, among: - inflow: Bike traffic flow inflow - outflow: Bike traffic flow outflow - PEMS03: California Highway network PeMS traffic flow dataset - BJ: Traffic dataset of road network in some areas of Beijing - METR:Traffic sensor data in the Los Angeles area You can also specify any additional graph dataset, in edgelist format, by editing data_loader.py |
inflow |
node_num |
int | the node number of the network - Inflow, Outflow: 295 - PEMS03:358 - BJ:500 - METR: 207 |
295 |
features |
string | forecasting task, options:[M, S, MS], among: - M: multivariate predict multivariate - S: univariate predict univariate - MS:multivariate predict univariate |
M |
llm_model |
string | LLM model: BERT,GPT2,GPT3,LLAMA1b,LLAMA7b,LLAMA8b, deepseek2b |
BERT |
static |
bool | Whether to use static adjacency matrix module | False |
gcn_true |
bool | Whether to use GCN module | True |
adaptive_hyperhgnn |
string | Hypergraph neuron network: hgcn, hgat, hsage | 'hgcn' |
hgcn_true |
bool | Whether to use HGCN module | True |
temporal_true |
bool | Whether to use Temporal convolutional networks Module | True |
fusion_gate |
string | Style of module fusion: - adaptive:dynamically adjusts the weight of time and spatial features,- attentiongate:considers the internal relationship between the two features- lstmgate:captures the dependence of space on temporal features - hyperstgnn :fully integrated adaptive hypergraph spatio-temporal prediction(without LLMs) |
adaptive |
llm_dim |
int | LLM model dimension - BERT, GPT2: 768 - LLAMA7b,LLAMA8b,GPT3: 4096 - LLAMA1b: 2048 - deepseek2b:1536 |
768 |
seq_len |
int | input sequence length | 48 |
label_len |
int | start token length | 48 |
pred_len |
int | prediction sequence length | 48 |
enc_in |
int | encoder input size (e.g, Node num) | 295 |
dec_in |
int | decoder input size (e.g, Node num) | 295 |
c_out |
int | output size (e.g, Node num) | 295 |
d_model |
int | dimension of model | 32 |
n_heads |
int | num of heads | 16 |
e_layers |
int | num of encoder layers | 2 |
d_layers |
int | num of decoder layers | 1 |
d_ff |
int | dimension of fcn | 32 |
llm_layers |
int | num of llm layer | 6 |
train_epochs |
int | Number of training epochs | 50 |
align_epochs |
int | Number of alignment epochs | 10 |
alpha |
float | Adjustable parameter to control hyperSTLLM or STLLM | 0.1 |
beta |
float | Adjustable parameter to control hyperSTLLM or STLLM | 0.2 |
gamma |
float | Adjustable parameter to control hyperSTLLM or STLLM | 0.5 |
theta |
float | Adjustable parameter to control hyperSTLLM or STLLM | 0.2 |
Run scripts for demonstration purpose under the folder ./scripts. For example, to evaluate on BIKE datasets by:
sh ./scripts/BIKE/BERT_Bike_order.sh
sh ./scripts/BIKE/GPT2_Bike_order.sh
sh ./scripts/BIKE/GPT3_Bike_order.sh
sh ./scripts/BIKE/LLAMA1B_Bike_order.sh
sh ./scripts/BIKE/LLAMA7B_Bike_order.sh
sh ./scripts/BIKE/LLAMA8B_Bike_order.sh
sh ./scripts/BIKE/Deepseek_Bike_order.shRun scripts for demonstration purpose under the folder ./scripts. For example, to evaluate on BIKE datasets by:
sh ./scripts/BIKE/BERT_Bike.sh
sh ./scripts/BIKE/GPT2_Bike.sh
sh ./scripts/BIKE/GPT3_Bike.sh
sh ./scripts/BIKE/LLAMA1B_Bike.sh
sh ./scripts/BIKE/LLAMA7B_Bike.sh
sh ./scripts/BIKE/LLAMA8B_Bike.sh
sh ./scripts/BIKE/Deepseek_Bike.shFor example, to evaluate on BIKE datasets, Set --fusion_gate as hyperstgnn. Note that fully integrated adaptive hypergraph spatio-temporal prediction(without LLMs)
Run scripts for demonstration purpose under the folder ./scripts. For example, to evaluate on BIKE datasets by:
sh ./scripts/BIKE/BERT_Bike_mixorder3.sh
sh ./scripts/BIKE/GPT2_Bike_mixorder3.sh
sh ./scripts/BIKE/GPT3_Bike_mixorder3.sh
sh ./scripts/BIKE/LLAMA1B_Bike_mixorder3.sh
sh ./scripts/BIKE/LLAMA7B_Bike_mixorder3.sh
sh ./scripts/BIKE/LLAMA8B_Bike_mixorder3.sh
sh ./scripts/BIKE/Deepseek_Bike_mixorder3.shRun scripts for demonstration purpose under the folder ./scripts. For example, to evaluate on BIKE datasets by:
sh ./scripts/BIKE/BERT_Bike_Outflow_flexible_order3.sh
sh ./scripts/PEMS/BERT_PEMS03_flexible_order.shThe fusion mechanism can be specified using the --fusion_gate argument. The available options are:
adaptive: Dynamically adjusts the weight of time and spatial features.attentiongate: Considers the internal relationship between the two features.lstmgate: Captures the dependence of space on temporal features.hyperstgnn: Fully integrated adaptive hypergraph spatio-temporal prediction (without LLMs).
STG4Traffic: {A} Survey and Benchmark of Spatial-Temporal Graph Neural Networks for Traffic Prediction. [Paper][Code].
@article{DBLP:journals/corr/abs-2307-00495,
author = {Xunlian Luo and Chunjiang Zhu and Detian Zhang and Qing Li},
title = {STG4Traffic: {A} Survey and Benchmark of Spatial-Temporal Graph Neural
Networks for Traffic Prediction},
journal = {CoRR},
volume = {abs/2307.00495},
year = {2023}
}Deep Time Series Models: A Comprehensive Survey and Benchmark. [Paper][Code].
@article{wang2024tssurvey,
title={Deep Time Series Models: A Comprehensive Survey and Benchmark},
author={Yuxuan Wang and Haixu Wu and Jiaxiang Dong and Yong Liu and Mingsheng Long and Jianmin Wang},
booktitle={arXiv preprint arXiv:2407.13278},
year={2024},
}





