██████╗ ███████╗ ██╗
██╔══██╗ ██╔════╝ ██║
██║ ██║ █████╗ ██║
██║ ██║ ██╔══╝ ██║
██████╔╝ ███████╗ ███████╗
╚═════╝ ╚══════╝ ╚══════╝
@article{zheng2026DEL,
title={DEL: Digit Entropy Loss for Numerical Learning of Large Language Models},
author={Zheng, Zhaohui and He, Chenhang and Wang, Shihao and Li, Yuxuan and Cheng, Ming-Ming and Zhang, Lei},
journal={arXiv preprint arXiv:2605.20369},
year={2026}
}
-
Clone this repository and enter it:
git clone https://github.com/PolyU-VCLab/DEL.git cd DEL -
Set up the environment for training Qwen and DeepSeek-Math,
conda create -n DEL-qwen python=3.10
conda activate DEL-qwen
pip install -r requirements-qwen-deepseek.txt- Set up the environment for training CodeLlama and Mistral,
conda create -n DEL-llama python=3.10 conda activate DEL-llama pip install -r requirements-codellama-mistral.txt
bash train_qwen.sh # train Qwen
bash train_deepseek.sh # train DeepSeek-Math-Instruct
bash train_codellama.sh # train CodeLlama
bash train_mistral.sh # train MistralWhen training is complete, evaluation will automatically process.
cd eval
unzip dataset.zip
bash all.sh # evaluate the seven mathematical reasoning benchmarks
bash eval.sh # evaluate one benchmarkYou need to modify the model path in all.sh and eval.sh.
| Model | mACC |
|---|---|
| CodeLlama-7B | 49.0 |
| Qwen2.5-1.5B | 55.4 |
| Mistral-7B | 56.5 |
| DeepSeek-math-7B-Instruct | 66.1 |
| Qwen2.5-7B | 70.6 |
The following results are evaluated on Qwen2.5-1.5B.
| Method | mACC | Venue |
|---|---|---|
| MLE | 52.8 | - |
| MixCE | 52.9 | ACL 2023 |
| EMO | 53.4 | ICLR 2024 |
| NTL-WAS | 53.8 | ICML 2025 |
| DIST2Loss | 53.1 | ICLR 2026 |
| DEL (Ours) | 55.4 | - |
Thank you to Xiang Yue et al. for their fork of MAmmoTH, which is an exellent work for mathematical reasoning.