Skip to content

Jac0bXu/better-homework-pdf

Repository files navigation

Better Homework PDF

Streamlit App Python 3.9+ License: MIT PyMuPDF Pillow

Transform cramped homework PDFs into spacious, answer-friendly documents. Each question gets its own page with plenty of room to write.

Try it Online

Before → After

Before
Cramped questions
After
Room to write
Before - cramped homework After - spaced homework

Features

  • Preserves Everything — Visual snapshots keep diagrams, math formulas, and formatting intact
  • Smart Detection — Auto-detects questions (1., Q1, Problem 1, Exercise 1) or use custom regex
  • Multi-page Questions — Questions spanning multiple source pages are stacked seamlessly, with source margins stripped so no blank gaps appear between segments
  • ≥ 50 % Writing Space Guaranteed — The last output page of every question contains at most half a page of content, leaving at least half blank for handwritten answers
  • Content-Aware Page Splits — When a tall question must overflow to a second page, splits snap to the nearest whitespace row so no text line or table row is cut in half
  • Title Page Generation — Automatically detects course number and homework number from PDF or filename
  • Page Numbers — Each question page shows "Question X of Y" (multi-page: "page k of N") at the bottom
  • Batch Processing — Upload multiple PDFs, download as ZIP or merged into a single PDF
  • Smart Merging — Merge multiple homework PDFs sorted by homework number (HW1, HW2, HW3...)
  • Works Offline — Run locally or build a standalone desktop app

Quick Start

# Install
pip install streamlit PyMuPDF Pillow

# Run
streamlit run app.py

Opens at http://localhost:8501

How It Works

┌─────────────────┐     ┌─────────────────┐
│ Original PDF    │     │ Title Page      │
├─────────────────┤     │ MATH 201 - HW 3 │
│ 1. Question A   │     │ (8 Questions)   │
│ 2. Question B   │ ──► ├─────────────────┤
│ 3. Question C   │     │ 1. Question A   │
│ ...             │     │  (stem)         │
└─────────────────┘     │  (sub-parts)    │  ← segments stacked,
                        │                 │     margins stripped
                        │  ≥ 50% blank    │  ← writing space
                        │ Question 1 of 8 │
                        ├─────────────────┤
                        │ 2. Question B   │
                        │  ...            │
                        └─────────────────┘

Layout algorithm (per question):

  1. All source-page segments are cropped, margin-stripped, and stacked into one image
  2. The stacked image is chunked to fit output pages:
    • Non-last pages: filled to capacity, split at the nearest whitespace row (never mid-line)
    • Last page: ≤ 50 % content → ≥ 50 % blank writing area

Multiple Files → Merged PDF

When you upload multiple homework PDFs, you can download them as:

  • Individual PDFs (ZIP file)
  • Single Merged PDF — All homework combined, sorted by homework number, with divider pages
┌─────────────────┐     ┌─────────────────┐
│ HW1.pdf         │     │ Merged PDF      │
│ HW2.pdf         │     ├─────────────────┤
│ HW3.pdf         │ ──► │ HW1 Title Page  │
└─────────────────┘     │ HW1 Questions   │
                        │ ─── Divider ─── │
                        │ HW2 Title Page  │
                        │ HW2 Questions   │
                        │ ─── Divider ─── │
                        │ HW3 Title Page  │
                        │ HW3 Questions   │
                        └─────────────────┘
                        
Filename: MATH_201_HW1-3_spaced_merged.pdf
Supported Patterns
Pattern Example Regex
Numbered 1., 2., 3. ^\d+\.
Q-style Q1, Q2 ^Q\d+
Problem Problem 1 ^Problem\s+\d+
Exercise Exercise 1 ^Exercise\s+\d+
Title Detection

The app automatically detects course and homework information:

From PDF content:

  • Course: MATH 201, CS 10100, PHYS 201A (2-4 letters + any number of digits)
  • Homework: Homework 3, HW 5, Assignment 2, Problem Set 4

From filename (fallback):

  • math201_hw3.pdf → MATH 201 - Homework 3
  • cs10100-assignment2.pdf → CS 10100 - Assignment 2
  • random_document.pdf → "Random Document" (title case)
Testing
pip install -r requirements.txt
pytest

Run with coverage:

pytest --cov=app --cov-report=term-missing

Tests cover pure functions, image processing, PDF manipulation, and title extraction. Streamlit is mocked so tests run without the UI.

Building Desktop App
pip install pyinstaller
python build.py --clean
  • macOS: dist/Better Homework PDF.app
  • Windows: dist/Better Homework PDF/Better Homework PDF.exe
Troubleshooting

No questions detected?

  • Try a different pattern preset
  • Use custom regex matching your format
  • Ensure PDF has searchable text (not scanned images)

macOS app won't open?

  • Right-click → "Open" to bypass Gatekeeper

License

MIT — free to use, modify, and distribute.

About

Transform cramped homework PDFs into spacious, answer-friendly documents. Each question gets its own page with plenty of room to write.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors