Pdf utilities for text extraction in digital and convert scanned pdf into canvas.
-
Updated
Mar 8, 2026 - TypeScript
Pdf utilities for text extraction in digital and convert scanned pdf into canvas.
A self-hosted PDF OCR API that converts scanned documents to markdown. Powered by PaddleOCR-VL, runs on GPU via Docker.
Lightweight bash script to convert scanned PDFs into searchable, copyable PDFs using Tesseract OCR with parallel processing.
Outil OCR permettant d’extraire et de structurer du texte à partir d’images et de PDF scannés (export en .docx et .txt) — prise en charge du français et de l’anglais
Add a description, image, and links to the scanned-pdf topic page so that developers can more easily learn about it.
To associate your repository with the scanned-pdf topic, visit your repo's landing page and select "manage topics."