A REST API for converting EPUB files to text and generating text-to-speech audio using the Supertonic-3 model. Supports direct text input, EPUB conversion via Calibre, and sentence-level audio generation with on-device inference.
- FastAPI: Modern, fast web framework for building APIs with Python
- Supertonic-3: Lightning-fast, on-device text-to-speech model supporting 31 languages
- ONNX Runtime: High-performance inference engine for machine learning models
- Calibre: eBook conversion tool for EPUB to text conversion
- Python 3.12: Core programming language
- HTML5/CSS3: Markup and styling
- Vanilla JavaScript: Client-side interactivity
- Flexbox: CSS layout system
- On-device TTS inference (no cloud API calls required)
- EPUB to text conversion via Calibre CLI
- Sentence-level audio generation
- Click-to-play from any text position
- Auto-scroll toggle functionality
- Adjustable font size
- Monospace font with borders for clarity
- RESTful API with OpenAPI/Swagger documentation
Interactive API documentation is available at:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
- Python 3.12+
- Calibre (for EPUB conversion)
- uv (Python package manager)
-
Clone the repository and navigate to the project directory
-
Create a virtual environment and install dependencies:
uv venv
uv pip install fastapi uvicorn python-multipart supertonic-
Set your Hugging Face token as environment variable or update in main.py
-
Run the server:
.venv\Scripts\python main.pyThe API will be available at http://localhost:8000
- Docker installed on your system
- Build the Docker image:
docker build -t epub-tts-reader .- Run the container:
docker run -d -p 8000:8000 -e HF_TOKEN=your_huggingface_token epub-tts-reader- Access the API at http://localhost:8000
Create a docker-compose.yml file:
version: '3.8'
services:
epub-tts-reader:
build: .
ports:
- "8000:8000"
environment:
- HF_TOKEN=your_huggingface_token
volumes:
- ./uploads:/app/uploads
- ./audio:/app/audioRun with:
docker-compose up -dGET /- Main web interface for EPUB TTS Reader
POST /upload-epub- Upload and convert EPUB to text
POST /load-text- Load and parse direct text input
POST /generate-audio- Generate audio for text segment
GET /audio/{filename}- Serve generated audio files
- Remove or secure the Hugging Face token before public deployment
- Implement rate limiting for API endpoints
- Add authentication/authorization for production use
- Validate and sanitize all user inputs
- Use environment variables for sensitive configuration
- Implement audio file cleanup to prevent disk space issues
- Add caching for frequently accessed content
- Consider using a CDN for static assets
- Implement request queuing for heavy TTS operations
- Use a production ASGI server like Gunicorn with Uvicorn workers
- Implement horizontal scaling with load balancing
- Use Redis or similar for session management
- Consider containerization with Docker
- Add logging for API requests and errors
- Implement health check endpoints
- Set up error tracking (e.g., Sentry)
- Monitor resource usage and response times
HF_TOKEN: Hugging Face API token for model accessPORT: Server port (default: 8000)
MIT