GitHub - SagarSawlani/TRI_Modal_DocRAG: Tri-Modal DocRAG uses YOLO, OCR, and VLMs to detect and extract titles, text, figures, and tables from document images, producing structured JSON outputs for RAG systems.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
MOCK_DATA		MOCK_DATA
dataset		dataset
runs/predict		runs/predict
.gitignore		.gitignore
.python-version		.python-version
Endgame.ipynb		Endgame.ipynb
README.md		README.md
best.pt		best.pt
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Repository files navigation

About

Tri-Modal DocRAG uses YOLO, OCR, and VLMs to detect and extract titles, text, figures, and tables from document images, producing structured JSON outputs for RAG systems.