Files in GIF, TIFF, and WebP formats can have multiple frames, but currently the service only processes the first frame of such files. While it is possible to load the file once and process each frame individually with Tesseract, altering the workflow requires the following considerations:
- Either only one
task_page_ocr() is called, which loads the file once and processes each frame, negating the advantage of distributing tasks across worker processes...
- ... or each
task_page_ocr() opens the file and seeks to the target frame, in which case the original file does not need to be decomposed into separate files for each frame.
- Without storing separate files representing each frame, the browser client cannot display each page in the interfaces for editing the layout and results.
- It must be checked whether
PIL.ImageDraw can draw in individual frames of the loaded image, to keep the current support for ignoring parts of the image.
Files in GIF, TIFF, and WebP formats can have multiple frames, but currently the service only processes the first frame of such files. While it is possible to load the file once and process each frame individually with Tesseract, altering the workflow requires the following considerations:
task_page_ocr()is called, which loads the file once and processes each frame, negating the advantage of distributing tasks across worker processes...task_page_ocr()opens the file and seeks to the target frame, in which case the original file does not need to be decomposed into separate files for each frame.PIL.ImageDrawcan draw in individual frames of the loaded image, to keep the current support for ignoring parts of the image.