Skip to content

jonasfroeller/digi-downloader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Digi-Downloader

A high-quality book downloader for digi4school.at that preserves vector graphics and generates compact PDF files.

⚠️ Disclaimer & Warning ⚠️

This script is intended for educational purposes only. It was created to demonstrate web scraping techniques with Playwright on a modern web application.

The content downloaded by this script (i.e., the books from digi4school.at) is copyrighted. Downloading and distributing this material may violate the website's Terms of Service and copyright law.

Use this script at your own risk:

  • You are solely responsible for your actions and for complying with all applicable laws and the website's terms.
  • Automated scraping may lead to your account being restricted or banned by the service provider.
  • The author of this script is not responsible for any misuse or any damages resulting from its use.

🎯 What it does

Downloads digital books from digi4school.at by:

  1. Logging into the platform automatically
  2. Finding and opening the specified book(s)
  3. Extracting each page as SVG (vector graphics)
  4. Inlining all assets (images) for offline viewing
  5. Generating a combined PDF file

✨ Features

  • Vector-based output: Maintains crisp, scalable graphics
  • Stealth browsing: Anti-detection to avoid bot blocking
  • Multiple books: Download single books, lists, or all available
  • Automatic pagination: Handles multi-page books seamlessly
  • Compact files: 45MB for 356 pages vs 1GB for rasterized images

🤔 Why so complicated?

digi4school.at has bot detection that returns HTTP 450 errors during login if it detects automated behavior. This tool uses:

  • puppeteer-extra-plugin-stealth to mask automation signatures
  • Persistent browser profiles to maintain session state
  • Natural timing patterns to mimic human behavior
  • Real Chrome browser instead of headless automation

Without these measures, the platform blocks automated access attempts.

📋 Prerequisites

🛠️ Setup

  1. Clone and install:
git clone https://github.com/jonasfroeller/digi-downloader.git
cd digi-downloader
ni
  1. Create .env file:
EMAIL=your.email@example.com
PASSWORD=your_password
BOOK_TITLE=Your Book Title

⚙️ Configuration

Book Selection via BOOK_TITLE in .env:

  • Single title: BOOK_TITLE=My Book
  • Multiple titles: BOOK_TITLE=Book A;Book B
  • JSON array: BOOK_TITLE=["Book A","Book B"]
  • All books: BOOK_TITLE=null (downloads everything not yet in ./books)

Optional settings:

CHROME_PATH=C:\Program Files\Google\Chrome\Application\chrome.exe
CHROME_PROFILE=C:\Users\YourUser\AppData\Local\digi4school-profile

🚀 Usage

bun run download.ts

Output structure:

books/
└── Your Book Title/
    ├── 0001.svg
    ├── 0002.svg
    ├── ...
    └── Your Book Title.pdf

📄 License

This project is licensed under the CC BY-NC-SA 4.0 License. See the LICENSE file for details.
Use responsibly and in accordance with digi4school.at Terms of Service and applicable copyright laws.


Note: Browser stays open for manual verification. Press Ctrl+C to quit.

About

Download your books for offline use - A project to show what playwright can do. This project is for educational purposes only because it's against tos to use the generated PDF-files. The project is for you to understand how services might stop bots and to implement systems to prevent bots from scraping your own services.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors