Skip to content

romizone/PDFtoword

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI PDF Tools

A lightweight, local web application for working with PDF files. All processing happens on your machine — no files are uploaded to external servers.

Python Flask License

Features

Feature Description
PDF to Word Convert PDF files to editable .docx format
Compress PDF Reduce PDF file size with 3 compression levels (Low / Medium / High)
OCR PDF Extract text from scanned/image-based PDFs with 20+ language support
Unlock PDF Remove password protection and restrictions from PDF files
Merge PDF Combine multiple PDF files into a single document
Split PDF Extract pages (all or by range) into separate PDFs

Screenshots

PDF to Word

Upload a PDF and convert it to an editable Word document.

Compress PDF

Choose compression level and reduce your PDF file size.

OCR PDF

Extract text from scanned documents with multi-language support.

Unlock PDF

Remove password protection from locked PDF files.

Tech Stack

  • Backend: Flask (Python)
  • Frontend: Vanilla HTML/CSS/JavaScript
  • PDF to Word: pdf2docx (PyMuPDF + python-docx)
  • Compression: pypdf
  • OCR: Tesseract OCR + pdf2image + Pillow
  • PDF Unlock: pypdf

Prerequisites

  • Python 3.10+
  • Tesseract OCR (for OCR feature)
  • Poppler (for OCR feature)

macOS

brew install tesseract tesseract-lang poppler

Ubuntu / Debian

sudo apt-get install tesseract-ocr tesseract-ocr-all poppler-utils

Windows

Installation

git clone https://github.com/romizone/PDFtoword.git
cd PDFtoword
python3 -m venv venv
source venv/bin/activate        # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Usage

python app.py

Open http://127.0.0.1:5000 in your browser.

Docker

docker build -t ai-pdf-tools .
docker run -p 5000:5000 ai-pdf-tools

OCR Supported Languages

English, Bahasa Indonesia, Malay, Japanese, Chinese (Simplified & Traditional), Korean, Arabic, Hindi, Thai, Vietnamese, French, German, Spanish, Portuguese, Italian, Dutch, Russian, Turkish, Polish, and more.

Project Structure

PDFtoword/
├── app.py                  # Flask application & API routes
├── config.py               # Configuration
├── requirements.txt        # Python dependencies
├── Dockerfile              # Docker deployment
├── Procfile                # Cloud deployment
├── services/
│   ├── pdf_to_word.py      # PDF to Word conversion
│   ├── pdf_compress.py     # PDF compression
│   ├── pdf_ocr.py          # OCR text extraction
│   ├── pdf_unlock.py       # PDF unlock/decrypt
│   ├── pdf_merge.py        # PDF merge
│   └── pdf_split.py        # PDF split
├── templates/
│   └── index.html          # Web UI
├── static/
│   ├── css/style.css       # Styles
│   └── js/app.js           # Frontend logic
├── uploads/                # Temp uploads (auto-cleaned)
└── outputs/                # Temp outputs (auto-cleaned)

API Endpoints

Method Endpoint Description
GET / Web UI
POST /api/convert PDF to Word (returns .docx)
POST /api/compress Compress PDF (returns compressed .pdf)
POST /api/ocr OCR extraction (returns JSON with text)
POST /api/unlock Unlock PDF (returns unlocked .pdf)
POST /api/merge Merge multiple PDFs (returns merged .pdf)
POST /api/split Split PDF pages (returns .zip)

Paper

For a detailed technical overview, architecture, and implementation details, see PAPER.md.

License

MIT License - see LICENSE for details.

Author

Romi Nur Ismanto@romizone

About

AI PDF Tools — Convert PDF to Word, Compress PDF, OCR text extraction, Unlock PDF. Local web app built with Flask & Python.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors