-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Pytesseract doesn't properly support multiframe images (e.g. TIFF) #343
Comments
Hi @mnechita and thank you for reporting this bug. Possible workaround if you don't need the pytesseract.image_to_osd('example/image/path.jpeg') I believe that this should still work, because PS: Also, we will need a multiframe test image ( |
Hey, Thanks for the reply. Nice suggestion, that actually helps me get around this in the meantime. Using the path to the image, while To confirm, tesseract supports multiframe images, as such, I've attached a sample osd generated from a 9 frame TIFF. Will start working on a PR tonight after work. |
Hello, Thanks |
Reproduce:
pytesseract.image_to_osd
Whereas calling the tesseract process on the image will generate the correct output containing each page.
Source of the bug:
When calling
save
on the in-memory data, pillow requires thesave_all=True
parameter (pillow docs) to save multiframe images on the disk. The parameter is not specified, thus the image gets truncated to the first frame.pytesseract/pytesseract/pytesseract.py
Line 201 in 45fe798
Possible solution
Check
Image.n_frames
before saving and set thesave_all
parameter accordinglyI can create a PR with the changes if solution sounds good enough
The text was updated successfully, but these errors were encountered: