Changelog

English | 简体中文

🚀Access MinerU Now→✅ Zero-Install Web Version ✅ Full-Featured Desktop Client ✅ Instant API Access; Skip deployment headaches – get all product formats in one click. Developers, dive in!

👋 join us on Discord and WeChat

Changelog

2025/12/30 2.7.0 Release
- Simplified installation process. No need to separately install vlm acceleration engine dependencies. Using uv pip install mineru[all] during installation will install all optional backend dependencies.
- Added new hybrid backend, which combines the advantages of pipeline and vlm backends. Built on vlm, it integrates some capabilities of pipeline, adding extra extensibility on top of high accuracy:
  - Directly extracts text from text PDFs, natively supports multi-language recognition in text PDF scenarios, and greatly reduces parsing hallucinations;
  - Supports text recognition in 109 languages for scanned PDF scenarios by specifying OCR language;
  - Independent inline formula recognition switch, which can be disabled separately when inline formula recognition is not needed, improving the visual effect of parsing results.
- Simplified engine selection logic for vlm/hybrid backends. Users only need to specify the backend as *-auto-engine, and the system will automatically select the appropriate engine for inference acceleration based on the current environment, improving usability.
- Switched default parsing backend from pipeline to hybrid-auto-engine, improving out-of-the-box result consistency for new users and avoiding cognitive differences in parsing results.
- Added i18n support to gradio application, supporting switching between Chinese and English languages.

📝 View the complete Changelog for more historical version information

MinerU

Project Introduction

MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format. MinerU was born during the pre-training process of InternLM. We focus on solving symbol conversion issues in scientific literature and hope to contribute to technological development in the era of large models. Compared to well-known commercial products, MinerU is still young. If you encounter any issues or if the results are not as expected, please submit an issue on issue and attach the relevant PDF.

pdf_zh_cn.mp4

Key Features

Remove headers, footers, footnotes, page numbers, etc., to ensure semantic coherence.
Output text in human-readable order, suitable for single-column, multi-column, and complex layouts.
Preserve the structure of the original document, including headings, paragraphs, lists, etc.
Extract images, image descriptions, tables, table titles, and footnotes.
Automatically recognize and convert formulas in the document to LaTeX format.
Automatically recognize and convert tables in the document to HTML format.
Automatically detect scanned PDFs and garbled PDFs and enable OCR functionality.
OCR supports detection and recognition of 109 languages.
Supports multiple output formats, such as multimodal and NLP Markdown, JSON sorted by reading order, and rich intermediate formats.
Supports various visualization results, including layout visualization and span visualization, for efficient confirmation of output quality.
Supports running in a pure CPU environment, and also supports GPU(CUDA)/NPU(CANN)/MPS acceleration
Compatible with Windows, Linux, and Mac platforms.

Quick Start

If you encounter any installation issues, please first consult the FAQ.
If the parsing results are not as expected, refer to the Known Issues.

Online Experience

Official online web application

The official online version has the same functionality as the client, with a beautiful interface and rich features, requires login to use

Gradio-based online demo

A WebUI developed based on Gradio, with a simple interface and only core parsing functionality, no login required

Local Deployment

Warning

Pre-installation Notice—Hardware and Software Environment Support

To ensure the stability and reliability of the project, we only optimize and test for specific hardware and software environments during development. This ensures that users deploying and running the project on recommended system configurations will get the best performance with the fewest compatibility issues.

By focusing resources on the mainline environment, our team can more efficiently resolve potential bugs and develop new features.

In non-mainline environments, due to the diversity of hardware and software configurations, as well as third-party dependency compatibility issues, we cannot guarantee 100% project availability. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first. Most issues already have corresponding solutions in the FAQ. We also encourage community feedback to help us gradually expand support.

Parsing Backend	pipeline	*-auto-engine		*-http-client
Parsing Backend	pipeline	hybrid	vlm	hybrid	vlm
Backend Features	Good Compatibility	High Hardware Requirements		For OpenAI Compatible Servers²
Accuracy¹	82+	90+
Operating System	Linux³ / Windows⁴ / macOS⁵
Pure CPU Support	✅	❌		✅
GPU Acceleration	Volta and later architecture GPUs or Apple Silicon				Not Required
Min VRAM	6GB	10GB	8GB	3GB	Not Required
RAM	Min 16GB, Recommended 32GB or more			Min 8GB
Disk Space	Min 20GB, SSD Recommended			Min 2GB
Python Version	3.10-3.13

¹ Accuracy metrics are the End-to-End Evaluation Overall scores from OmniDocBench (v1.5), based on the latest version of MinerU.
² Servers compatible with OpenAI API, such as local model servers or remote model services deployed via inference frameworks like vLLM/SGLang/LMDeploy.
³ Linux only supports distributions from 2019 and later.
⁴ Since the key dependency ray does not support Python 3.13 on Windows, only versions 3.10~3.12 are supported.
⁵ macOS requires version 14.0 or later.

Install MinerU

Install MinerU using pip or uv

pip install --upgrade pip
pip install uv
uv pip install -U "mineru[all]"

Install MinerU from source code

git clone https://github.com/opendatalab/MinerU.git
cd MinerU
uv pip install -e .[all]

Tip

mineru[all] includes all core features, compatible with Windows / Linux / macOS systems, suitable for most users. If you need to specify the inference framework for the VLM model, or only intend to install a lightweight client on an edge device, please refer to the documentation Extension Modules Installation Guide.

Deploy MinerU using Docker

MinerU provides a convenient Docker deployment method, which helps quickly set up the environment and solve some tricky environment compatibility issues. You can get the Docker Deployment Instructions in the documentation.

Using MinerU

If your device meets the GPU acceleration requirements in the table above, you can use a simple command line for document parsing:

mineru -p <input_path> -o <output_path>

If your device does not meet the GPU acceleration requirements, you can specify the backend as pipeline to run in a pure CPU environment:

mineru -p <input_path> -o <output_path> -b pipeline

You can use MinerU for PDF parsing through various methods such as command line, API, and WebUI. For detailed instructions, please refer to the Usage Guide.

TODO

Known Issues

Reading order is determined by the model based on the spatial distribution of readable content, and may be out of order in some areas under extremely complex layouts.
Limited support for vertical text.
Tables of contents and lists are recognized through rules, and some uncommon list formats may not be recognized.
Code blocks are not yet supported in the layout model.
Comic books, art albums, primary school textbooks, and exercises cannot be parsed well.
Table recognition may result in row/column recognition errors in complex tables.
OCR recognition may produce inaccurate characters in PDFs of lesser-known languages (e.g., diacritical marks in Latin script, easily confused characters in Arabic script).
Some formulas may not render correctly in Markdown.

FAQ

If you encounter any issues during usage, you can first check the FAQ for solutions.
If your issue remains unresolved, you may also use DeepWiki to interact with an AI assistant, which can address most common problems.
If you still cannot resolve the issue, you are welcome to join our community via Discord or WeChat to discuss with other users and developers.

All Thanks To Our Contributors

License Information

LICENSE.md

Currently, some models in this project are trained based on YOLO. However, since YOLO follows the AGPL license, it may impose restrictions on certain use cases. In future iterations, we plan to explore and replace these with models under more permissive licenses to enhance user-friendliness and flexibility.

Acknowledgments

Citation

@misc{niu2025mineru25decoupledvisionlanguagemodel,
      title={MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing}, 
      author={Junbo Niu and Zheng Liu and Zhuangcheng Gu and Bin Wang and Linke Ouyang and Zhiyuan Zhao and Tao Chu and Tianyao He and Fan Wu and Qintong Zhang and Zhenjiang Jin and Guang Liang and Rui Zhang and Wenzheng Zhang and Yuan Qu and Zhifei Ren and Yuefeng Sun and Yuanhong Zheng and Dongsheng Ma and Zirui Tang and Boyu Niu and Ziyang Miao and Hejun Dong and Siyi Qian and Junyuan Zhang and Jingzhou Chen and Fangdong Wang and Xiaomeng Zhao and Liqun Wei and Wei Li and Shasha Wang and Ruiliang Xu and Yuanyuan Cao and Lu Chen and Qianqian Wu and Huaiyu Gu and Lindong Lu and Keming Wang and Dechen Lin and Guanlin Shen and Xuanhe Zhou and Linfeng Zhang and Yuhang Zang and Xiaoyi Dong and Jiaqi Wang and Bo Zhang and Lei Bai and Pei Chu and Weijia Li and Jiang Wu and Lijun Wu and Zhenxiang Li and Guangyu Wang and Zhongying Tu and Chao Xu and Kai Chen and Yu Qiao and Bowen Zhou and Dahua Lin and Wentao Zhang and Conghui He},
      year={2025},
      eprint={2509.22186},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.22186}, 
}

@misc{wang2024mineruopensourcesolutionprecise,
      title={MinerU: An Open-Source Solution for Precise Document Content Extraction}, 
      author={Bin Wang and Chao Xu and Xiaomeng Zhao and Linke Ouyang and Fan Wu and Zhiyuan Zhao and Rui Xu and Kaiwen Liu and Yuan Qu and Fukai Shang and Bo Zhang and Liqun Wei and Zhihao Sui and Wei Li and Botian Shi and Yu Qiao and Dahua Lin and Conghui He},
      year={2024},
      eprint={2409.18839},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2409.18839}, 
}

@article{he2024opendatalab,
  title={Opendatalab: Empowering general artificial intelligence with open datasets},
  author={He, Conghui and Li, Wei and Jin, Zhenjiang and Xu, Chao and Wang, Bin and Lin, Dahua},
  journal={arXiv preprint arXiv:2407.13773},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4,484 Commits
.github		.github
demo		demo
docker		docker
docs		docs
mineru		mineru
projects		projects
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
MinerU_CLA.md		MinerU_CLA.md
README.md		README.md
README_zh-CN.md		README_zh-CN.md
SECURITY.md		SECURITY.md
mineru.template.json		mineru.template.json
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
update_version.py		update_version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Changelog

MinerU

Project Introduction

Key Features

Quick Start

Online Experience

Official online web application

Gradio-based online demo

Local Deployment

Install MinerU

Install MinerU using pip or uv

Install MinerU from source code

Deploy MinerU using Docker

Using MinerU

TODO

Known Issues

FAQ

All Thanks To Our Contributors

License Information

Acknowledgments

Citation

Star History

Links

About

Uh oh!

Releases

Packages

Languages

License

zero456/MinerU

Folders and files

Latest commit

History

Repository files navigation

Changelog

MinerU

Project Introduction

Key Features

Quick Start

Online Experience

Official online web application

Gradio-based online demo

Local Deployment

Install MinerU

Install MinerU using pip or uv

Install MinerU from source code

Deploy MinerU using Docker

Using MinerU

TODO

Known Issues

FAQ

All Thanks To Our Contributors

License Information

Acknowledgments

Citation

Star History

Links

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages