Convert your comic book panels into speech using OCR and Text-to-Speech (TTS) technology!
This project automatically detects panels in a comic image, extracts text using EasyOCR, and generates corresponding audio using gTTS. Finally, it merges the panel audios into a single voiceover file for a seamless listening experience.
-
📖 Comic Panel Detection
Automatically splits comic pages into individual panels using image processing. -
🔍 Text Extraction
Extracts English text from each panel using EasyOCR. -
🎤 Text-to-Speech
Converts extracted text to audio using Google Text-to-Speech (gTTS). -
🎧 Audio Compilation
Combines all panel audio files into one, with pauses between them.
Ensure you're using a Python environment like Google Colab or Jupyter Notebook. Then install the dependencies:
pip install easyocr opencv-python numpy matplotlib gTTS pydubAlso, install FFmpeg for audio processing via pydub. In Google Colab, run:
!apt install ffmpeg- Place your comic image in the working directory.
- Update the path in the code:
image_path = "Comic3.jpg"This will:
- ✅ Detect comic panels
- ✅ Extract text from each panel
- ✅ Generate TTS for each panel
- ✅ Save panel audios
- ✅ Merge them into a single audio file
- Converts comic image to binary using thresholding.
- Identifies white spaces to segment panels.
- Uses EasyOCR to extract text from each panel image.
- Uses Google Text-to-Speech (gTTS) to convert text to MP3 files.
- Uses pydub to concatenate all MP3s with short pauses in between.
- easyocr
- opencv-python
- numpy
- matplotlib
- gTTS
- pydub
System Dependency:
- ffmpeg (external system dependency)
This project is licensed under the MIT License.
- Comic images used are for demonstration purposes only.
- OCR by EasyOCR
- TTS powered by gTTS