You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reasons why (example scenarios where this comes in useful)
tesseract has an OSD mode, but its text bbox (bounding box) detection may be sub-par for your use case.
you're using the new LSTM classifier (v4/v5), which works best when fed a greyscale/color1 image, but the image thresholding on your source image is just meh and when you feed it an (manipulated) image that does well in the segmentation phase, your LSTM engine drops a few points in quality.
....
Approach
allow user to provide a separate PIX image for each of the leptonica stages:
thresholding + segmentation stage can benefit from a open+closed+dilated or otherwise "cleaned up and thickened" greyscale image, so we 'em that
So we give the LSTM engine its own denoised GREY322/GREY8/RGB24 image
meanwhile we would also like the older v3 core to have a try, which takes (AFAICT) binarized input, but rather NOT that fattened one above which we prepped for the thresholding+segmentation stage: a third (B&W) image for this one, then!
Footnotes
why on earth tesseract takes in a RGB24 pixel series for its LSTM engine is an open question; the only sensible reason I can come up with is that this way you still keep a little more pixel info than with GREY8, but then I think: leptonica supports float B&W (GREY32) AFAICT, so why not use that one? I expect the same quality with 1/3rd of the input nodes of LSTM. -- Of course, the "lazy" reason I can come up with is: LSTM was first done in Python, in ocropus, and they used a generic engine, which ate RGB24 as input, so to keep the transition manageable and uncomplicated when shoveling this into tesseract/C++, you'ld rather keep the design parameters the same until you're surely done, so RGB24 it is. ... But then nobody considered moving this result onto float greyscale and thus 1/3rd of the input nodes? Because I don' see tesseract training on anything that's particularly colourful or to do with color per se, so RGB24 only serves as a poor man's "greyscale float" at 3*256 integer values, vs. the usual suspect: basic GREY8 which only has 256 grey levels. A conundrum! 🤔 🤔 🤔 ↩
GREY32 is currently not part of tesseract. GREY32 status: TODO↩
The text was updated successfully, but these errors were encountered:
GerHobbelt
changed the title
tesseract :: supply masks, tc. for each stage as images overriding or assisting the tesseract engine that way
tesseract :: supply masks, etc. for each stage as images overriding or assisting the tesseract engine that way
Sep 13, 2023
Reasons why (example scenarios where this comes in useful)
tesseract
has an OSD mode, but its textbbox
(bounding box) detection may be sub-par for your use case.Approach
allow user to provide a separate
PIX
image for each of the leptonica stages:thresholding + segmentation stage can benefit from a open+closed+dilated or otherwise "cleaned up and thickened" greyscale image, so we 'em that
LSTM (v4/v5 core engine) currently takes the source image in RGB24 or greyscale (GREY8) and feeding that one a thresholded image to better cope with adversarial noise turns out to be non-optimal: see fix failure to OCR: general quality issue due to LSTM being fed noisy/crappy *original* image pixels instead of cleaned-up binarized pixels. tesseract-ocr/tesseract#4111 f.e.
So we give the LSTM engine its own denoised GREY322/GREY8/RGB24 image
meanwhile we would also like the older v3 core to have a try, which takes (AFAICT) binarized input, but rather NOT that fattened one above which we prepped for the thresholding+segmentation stage: a third (B&W) image for this one, then!
Footnotes
why on earth tesseract takes in a RGB24 pixel series for its LSTM engine is an open question; the only sensible reason I can come up with is that this way you still keep a little more pixel info than with GREY8, but then I think: leptonica supports float B&W (GREY32) AFAICT, so why not use that one? I expect the same quality with 1/3rd of the input nodes of LSTM. -- Of course, the "lazy" reason I can come up with is: LSTM was first done in Python, in ocropus, and they used a generic engine, which ate RGB24 as input, so to keep the transition manageable and uncomplicated when shoveling this into tesseract/C++, you'ld rather keep the design parameters the same until you're surely done, so RGB24 it is. ... But then nobody considered moving this result onto float greyscale and thus 1/3rd of the input nodes? Because I don' see tesseract training on anything that's particularly colourful or to do with color per se, so RGB24 only serves as a poor man's "greyscale float" at 3*256 integer values, vs. the usual suspect: basic GREY8 which only has 256 grey levels. A conundrum! 🤔 🤔 🤔 ↩
GREY32 is currently not part of tesseract. GREY32 status: TODO ↩
The text was updated successfully, but these errors were encountered: