scanned-pdf

Star

Here are 4 public repositories matching this topic...

PT-Perkasa-Pilar-Utama / ppu-pdf

Star

Pdf utilities for text extraction in digital and convert scanned pdf into canvas.

npm text-extraction pdfjs pdf-reader jsr rag pdf-digital pdf-canvas scanned-pdf

Updated Mar 8, 2026
TypeScript

Edgaras0x4E / paddleocr-pdf-api

Star

A self-hosted PDF OCR API that converts scanned documents to markdown. Powered by PaddleOCR-VL, runs on GPU via Docker.

api docker pdf ocr pdf-extractor ocr-api paddleocr document-ai document-ocr document-parsing vision-language-model pdf-ocr pdf-to-markdown scanned-pdf paddleocr-vl multilingual-ocr

Updated Apr 19, 2026
Python

maxgfr / copyable-pdf

Sponsor

Star

Lightweight bash script to convert scanned PDFs into searchable, copyable PDFs using Tesseract OCR with parallel processing.

cli pdf automation ocr tesseract scanned-documents poppler document-processing pdfunite searchable-pdf pdftoppm scanned-pdf

Updated Mar 6, 2026
Shell

Nath9666 / Lexo

Star

Outil OCR permettant d’extraire et de structurer du texte à partir d’images et de PDF scannés (export en .docx et .txt) — prise en charge du français et de l’anglais

desktop-app multilingual python gui ocr document-conversion drag-and-drop logging image-processing text-extraction docx tesseract-ocr python-docx txt pdf2image pdf2text scanned-pdf

Updated May 29, 2025
Python

Improve this page

Add a description, image, and links to the scanned-pdf topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the scanned-pdf topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scanned-pdf

Here are 4 public repositories matching this topic...

PT-Perkasa-Pilar-Utama / ppu-pdf

Edgaras0x4E / paddleocr-pdf-api

maxgfr / copyable-pdf

Nath9666 / Lexo

Improve this page

Add this topic to your repo