Official implementation of the paper: Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner
📌 Accepted at IJCAI 2025
We present NGTR (Neighbor-Guided Toolchain Reasoner), a novel framework that enhances table recognition in document images by integrating Vision Large Language Models (VLLMs) with lightweight visual tools and retrieval-augmented planning strategies.
Despite the recent progress of VLLMs, their performance on table recognition tasks—particularly in low-quality image settings—remains under-explored. NGTR fills this gap through a modular reasoning pipeline and sets a new benchmark standard for structured data extraction from tables.
- Pioneering VLLM-based Table Recognition: We introduce the first comprehensive benchmark that evaluates VLLMs in training-free table recognition tasks with hierarchical evaluation design.
- Neighbor-Guided Reasoning Framework: NGTR introduces a reflection-driven, modular toolchain system to improve input quality and guide recognition effectively.
- Extensive Evaluation: Demonstrated state-of-the-art performance across SciTSR, PubTabNet, and WTW datasets, showcasing robustness in both clean and noisy table environments.
This repo is built and tested under Python 3.9.19.
To set up the environment:
conda create -n NGTR python=3.9 -y
conda activate NGTR
pip install -r requirements.txtTo run the main pipeline, execute:
python main.pyPlease refer to the main.py file for detailed arguments and configuration instructions.
🙋 Please let us know if you find out a mistake or have any suggestions!
🌟 If you find this resource helpful, please consider to star this repository and cite our research:
@article{zhou2024enhancing,
title={Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner},
author={Zhou, Yitong and Cheng, Mingyue and Mao, Qingyang and Liu, Qi and Xu, Feiyang and Li, Xin and Chen, Enhong},
journal={arXiv preprint arXiv:2412.20662},
year={2024}
}This work builds on prior contributions and datasets from the following repositories:
