A conditional GAN that removes watermarks, binarizes, deblurs, and cleans scanned document images.
Scanned documents in the wild are messy — overlaid watermarks, faint copies, blur, and noise all hurt downstream OCR and analysis. This project implements DE-GAN (Souibgui & Kessentini, 2020), a conditional Generative Adversarial Network that learns to map degraded document images to their clean counterparts.
The same architecture handles four related tasks (watermark removal, binarization, deblurring, and general cleaning) by changing only the training data. After this README you'll be able to run inference with pretrained weights and train your own variants.
- 🎯 Four document-enhancement tasks with a single model architecture
- 🖼️ Paired-image training (degraded → ground-truth) keeps the loss interpretable
- 🐍 Pure Python + TensorFlow 2.x — no exotic dependencies
- 📜 Inspired by published research (DE-GAN, IEEE TPAMI 2020)
| Task | Description | Use case |
|---|---|---|
| Watermark removal | Removes overlaid watermarks from documents | Archival recovery, redaction reversal |
| Binarization | Color/grayscale → clean binary | OCR preprocessing |
| Deblurring | Sharpens blurred documents | Phone-captured scans |
| Cleaning | General noise and artifact removal | Quality normalization |
DE-GAN follows the conditional-GAN recipe (Pix2Pix–style):
- Generator — a U-Net-style encoder/decoder maps the degraded input to a candidate clean image.
- Discriminator — a PatchGAN classifier tells real clean pairs from generated ones.
- Combined loss — adversarial loss + a strong pixel-wise reconstruction term keeps outputs faithful to ground truth instead of just "plausibly clean."
The training script alternates discriminator and generator updates. Augmentation utilities in augmentation/ randomly perturb training pairs to improve generalization.
.
├── augmentation/ # Data augmentation utilities
├── common/
│ ├── Generator.py # U-Net generator network
│ └── utils.py # Helpers (data loading, image ops)
├── service/ # Service wrapper (for inference deployment)
├── predict.py # Inference entry point
├── train_wm.py # Train the watermark-removal variant
├── train_dn.py # Train the denoise variant
├── requirements.txt
└── LICENSE
- Python 3.7+
- TensorFlow 2.x
- An NVIDIA GPU is strongly recommended for training (8 GB+ VRAM); CPU is fine for inference on small batches.
git clone https://github.com/aifriend/doc_watermark_cleaner.git
cd doc_watermark_cleaner
pip install -r requirements.txtPretrained checkpoints are hosted on Google Drive — [TODO: real link] — save them to ./weights/.
# Watermark removal
python predict.py --task unwatermark --input ./data_wm --output ./results
# Binarization
python predict.py --task binarize --input ./input --output ./output
# Deblurring
python predict.py --task deblur --input ./input --output ./outputPrepare paired datasets (degraded + ground-truth images, same filenames in matching folders), then:
python train_wm.py # Watermark removal
python train_dn.py # DenoisingTraining configuration (epochs, batch size, learning rate) is set at the top of each script — straightforward to tweak.
| Task | Metric | This implementation |
|---|---|---|
| Watermark removal | PSNR (test set) | to be reported |
| Binarization | F-measure (DIBCO) | to be reported |
- Publish quantitative results on standard benchmarks (DIBCO, custom watermark set)
- Add ONNX export for lightweight deployment
- Wrap inference in a FastAPI service (started in
service/) - Replace TF training loop with a PyTorch Lightning implementation for portability
If you use this work, please cite the original paper that motivated the architecture:
@article{souibgui2020degan,
title = {DE-GAN: A Conditional Generative Adversarial Network for Document Enhancement},
author = {Souibgui, Mohamed Ali and Kessentini, Yousri},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
year = {2020}
}This project is for academic and research purposes only. For commercial use, please contact the maintainer. See LICENSE for details.
Jose Lopez — AI engineer in Madrid, working on document intelligence and the intersection of biological and artificial intelligence.
- DE-GAN paper authors for the architecture
- The Pix2Pix family of work (Isola et al.) for the broader cGAN-for-image-translation framework
