Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tesseract :: supply masks, etc. for each stage as images overriding or assisting the tesseract engine that way #4

Open
GerHobbelt opened this issue Sep 13, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@GerHobbelt
Copy link
Owner

GerHobbelt commented Sep 13, 2023

Reasons why (example scenarios where this comes in useful)

  • tesseract has an OSD mode, but its text bbox (bounding box) detection may be sub-par for your use case.
  • you're using the new LSTM classifier (v4/v5), which works best when fed a greyscale/color1 image, but the image thresholding on your source image is just meh and when you feed it an (manipulated) image that does well in the segmentation phase, your LSTM engine drops a few points in quality.
  • ....

Approach

allow user to provide a separate PIX image for each of the leptonica stages:


Footnotes

  1. why on earth tesseract takes in a RGB24 pixel series for its LSTM engine is an open question; the only sensible reason I can come up with is that this way you still keep a little more pixel info than with GREY8, but then I think: leptonica supports float B&W (GREY32) AFAICT, so why not use that one? I expect the same quality with 1/3rd of the input nodes of LSTM. -- Of course, the "lazy" reason I can come up with is: LSTM was first done in Python, in ocropus, and they used a generic engine, which ate RGB24 as input, so to keep the transition manageable and uncomplicated when shoveling this into tesseract/C++, you'ld rather keep the design parameters the same until you're surely done, so RGB24 it is. ... But then nobody considered moving this result onto float greyscale and thus 1/3rd of the input nodes? Because I don' see tesseract training on anything that's particularly colourful or to do with color per se, so RGB24 only serves as a poor man's "greyscale float" at 3*256 integer values, vs. the usual suspect: basic GREY8 which only has 256 grey levels. A conundrum! 🤔 🤔 🤔

  2. GREY32 is currently not part of tesseract. GREY32 status: TODO

@GerHobbelt GerHobbelt added the enhancement New feature or request label Sep 13, 2023
@GerHobbelt GerHobbelt changed the title tesseract :: supply masks, tc. for each stage as images overriding or assisting the tesseract engine that way tesseract :: supply masks, etc. for each stage as images overriding or assisting the tesseract engine that way Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant