tesseract :: supply masks, etc. for each stage as images overriding or assisting the tesseract engine that way #4

GerHobbelt · 2023-09-13T16:50:26Z

Reasons why (example scenarios where this comes in useful)

tesseract has an OSD mode, but its text bbox (bounding box) detection may be sub-par for your use case.
you're using the new LSTM classifier (v4/v5), which works best when fed a greyscale/color¹ image, but the image thresholding on your source image is just meh and when you feed it an (manipulated) image that does well in the segmentation phase, your LSTM engine drops a few points in quality.
....

Approach

allow user to provide a separate PIX image for each of the leptonica stages:

thresholding + segmentation stage can benefit from a open+closed+dilated or otherwise "cleaned up and thickened" greyscale image, so we 'em that
LSTM (v4/v5 core engine) currently takes the source image in RGB24 or greyscale (GREY8) and feeding that one a thresholded image to better cope with adversarial noise turns out to be non-optimal: see fix failure to OCR: general quality issue due to LSTM being fed noisy/crappy *original* image pixels instead of cleaned-up binarized pixels. tesseract-ocr/tesseract#4111 f.e.

So we give the LSTM engine its own denoised GREY32²/GREY8/RGB24 image
meanwhile we would also like the older v3 core to have a try, which takes (AFAICT) binarized input, but rather NOT that fattened one above which we prepped for the thresholding+segmentation stage: a third (B&W) image for this one, then!

why on earth tesseract takes in a RGB24 pixel series for its LSTM engine is an open question; the only sensible reason I can come up with is that this way you still keep a little more pixel info than with GREY8, but then I think: leptonica supports float B&W (GREY32) AFAICT, so why not use that one? I expect the same quality with 1/3rd of the input nodes of LSTM. -- Of course, the "lazy" reason I can come up with is: LSTM was first done in Python, in ocropus, and they used a generic engine, which ate RGB24 as input, so to keep the transition manageable and uncomplicated when shoveling this into tesseract/C++, you'ld rather keep the design parameters the same until you're surely done, so RGB24 it is. ... But then nobody considered moving this result onto float greyscale and thus 1/3rd of the input nodes? Because I don' see tesseract training on anything that's particularly colourful or to do with color per se, so RGB24 only serves as a poor man's "greyscale float" at 3*256 integer values, vs. the usual suspect: basic GREY8 which only has 256 grey levels. A conundrum! 🤔 🤔 🤔 ↩
GREY32 is currently not part of tesseract. GREY32 status: TODO ↩

The text was updated successfully, but these errors were encountered:

GerHobbelt added the enhancement New feature or request label Sep 13, 2023

GerHobbelt changed the title ~~tesseract :: supply masks, tc. for each stage as images overriding or assisting the tesseract engine that way~~ tesseract :: supply masks, etc. for each stage as images overriding or assisting the tesseract engine that way Sep 13, 2023

GerHobbelt mentioned this issue Sep 13, 2023

tesseract/leptonica :: accept GREY32 a.k.a. floating point greyscale images as input #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tesseract :: supply masks, etc. for each stage as images overriding or assisting the tesseract engine that way #4

tesseract :: supply masks, etc. for each stage as images overriding or assisting the tesseract engine that way #4

GerHobbelt commented Sep 13, 2023 •

edited

tesseract :: supply masks, etc. for each stage as images overriding or assisting the tesseract engine that way #4

tesseract :: supply masks, etc. for each stage as images overriding or assisting the tesseract engine that way #4

Comments

GerHobbelt commented Sep 13, 2023 • edited

Reasons why (example scenarios where this comes in useful)

Approach

Footnotes

GerHobbelt commented Sep 13, 2023 •

edited