Skip to content

sharadcodes/supertonic3-book-reader

Repository files navigation

EPUB TTS Reader API

Testing phase (All features might not work)

A REST API for converting EPUB files to text and generating text-to-speech audio using the Supertonic-3 model. Supports direct text input, EPUB conversion via Calibre, and sentence-level audio generation with on-device inference.

Technology Stack

Backend

  • FastAPI: Modern, fast web framework for building APIs with Python
  • Supertonic-3: Lightning-fast, on-device text-to-speech model supporting 31 languages
  • ONNX Runtime: High-performance inference engine for machine learning models
  • Calibre: eBook conversion tool for EPUB to text conversion
  • Python 3.12: Core programming language

Frontend

  • HTML5/CSS3: Markup and styling
  • Vanilla JavaScript: Client-side interactivity
  • Flexbox: CSS layout system

Key Features

  • On-device TTS inference (no cloud API calls required)
  • EPUB to text conversion via Calibre CLI
  • Sentence-level audio generation
  • Click-to-play from any text position
  • Auto-scroll toggle functionality
  • Adjustable font size
  • Monospace font with borders for clarity
  • RESTful API with OpenAPI/Swagger documentation

API Documentation

Interactive API documentation is available at:

Installation

Local Development

Prerequisites

  • Python 3.12+
  • Calibre (for EPUB conversion)
  • uv (Python package manager)

Setup

  1. Clone the repository and navigate to the project directory

  2. Create a virtual environment and install dependencies:

uv venv
uv pip install fastapi uvicorn python-multipart supertonic
  1. Set your Hugging Face token as environment variable or update in main.py

  2. Run the server:

.venv\Scripts\python main.py

The API will be available at http://localhost:8000

Docker Deployment

Prerequisites

  • Docker installed on your system

Build and Run

  1. Build the Docker image:
docker build -t epub-tts-reader .
  1. Run the container:
docker run -d -p 8000:8000 -e HF_TOKEN=your_huggingface_token epub-tts-reader
  1. Access the API at http://localhost:8000

Docker Compose

Create a docker-compose.yml file:

version: '3.8'
services:
  epub-tts-reader:
    build: .
    ports:
      - "8000:8000"
    environment:
      - HF_TOKEN=your_huggingface_token
    volumes:
      - ./uploads:/app/uploads
      - ./audio:/app/audio

Run with:

docker-compose up -d

API Endpoints

Web Interface

  • GET / - Main web interface for EPUB TTS Reader

EPUB Processing

  • POST /upload-epub - Upload and convert EPUB to text

Text Processing

  • POST /load-text - Load and parse direct text input

TTS Generation

  • POST /generate-audio - Generate audio for text segment

Audio Serving

  • GET /audio/{filename} - Serve generated audio files

Deployment Considerations

Security

  • Remove or secure the Hugging Face token before public deployment
  • Implement rate limiting for API endpoints
  • Add authentication/authorization for production use
  • Validate and sanitize all user inputs
  • Use environment variables for sensitive configuration

Performance

  • Implement audio file cleanup to prevent disk space issues
  • Add caching for frequently accessed content
  • Consider using a CDN for static assets
  • Implement request queuing for heavy TTS operations

Scalability

  • Use a production ASGI server like Gunicorn with Uvicorn workers
  • Implement horizontal scaling with load balancing
  • Use Redis or similar for session management
  • Consider containerization with Docker

Monitoring

  • Add logging for API requests and errors
  • Implement health check endpoints
  • Set up error tracking (e.g., Sentry)
  • Monitor resource usage and response times

Environment Variables

  • HF_TOKEN: Hugging Face API token for model access
  • PORT: Server port (default: 8000)

License

MIT

About

A web application that converts EPUB files to text and generates text-to-speech audio using the Supertonic-3 model. Features include multilingual support (31 languages), multiple voice styles, expression tags (, , ), configurable playback speed, and click-to-play functionality with sentence-level highlighting.

Topics

Resources

License

Stars

Watchers

Forks

Contributors