Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-linear grayscale normalization for layout analyse and/or text recognition #3857

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

JKamlah
Copy link
Contributor

@JKamlah JKamlah commented Jul 4, 2022

Non-linear grayscale normalization

Draft PR only

I would first like to use the draft PR option to get some feedback that the implementation of grayscale normalization works smoothly on a wide variety of templates and is proving beneficial. Please test extensively.

Image normalization

In some cases, image normalization is applied to improve LA and OCR results. A popular method is called nlbin, which is a non-linear grayscale normalization with the option of subsequent binarization. This method was developed by Thomas Breuel for the text recognition program Ocropus.

In this PR the nlbin method was adapted for the existing Leptonica functions. The method can be activated via the parameter for layout analysis and/or the actual text recognition. It only performs a grayscale normalization and then the existing binarization methods can be still applied to it.

The "preprocess_graynorm_mode" parameter

This parameter is an INT member with currently 4 modes and can be activated with "-c preprocess_graynorm_mode=INT".
The modes:
0=no normalization applied (default)
1= apply normalization for thresholding & recognition
2= apply normalization for thresholding (only)
3= apply normalization for recognition (only)

The modes 1-3 are applied on the fullimage.
A normalization on linelevel would also be desirable. (not implemented yet)

Additional option

With the parameter "-c tessedit_write_images=1" the normalized image can be written out as tiff.

@amitdo
Copy link
Collaborator

amitdo commented Jul 6, 2022

Hi @JKamlah,

Leptonica has some built-in grayscale normalization functions, maybe we can also use them.

https://github.com/DanBloomberg/leptonica/blob/0ffbc6822c23725b5b9f6876e2620a22ba3689f4/src/adaptmap.c

Here are some examples that demonstrate how to use them to improve thresholding using Otsu's or Sauvola's methods:

https://github.com/DanBloomberg/leptonica/blob/1297942d8b5c1a76abdde93ab4bbd5472870b937/src/binarize.c

I suggest to try to add at least pixContrastNorm() so it can later be followed by Sauvola.

@amitdo
Copy link
Collaborator

amitdo commented Jul 6, 2022

CC: @bertsky

@amitdo
Copy link
Collaborator

amitdo commented Jul 6, 2022

You can use this image for testing the new feature:

https://github.com/DanBloomberg/leptonica/blob/a14036fa5f5ea971/prog/w91frag.jpg

Copy link
Contributor

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff, many thanks for bringing this forward!

Good idea to provide a mode parameter where to apply the normalization. But I am afraid the current design does not always completely match the intended function:

Comment on lines +934 to +943
if (mode == 1) {
SetInputImage(thresholder_->GetPixNormRectGrey());
thresholder_->SetImage(GetInputImage());
} else if (mode == 2) {
thresholder_->SetImage(thresholder_->GetPixNormRectGrey());
} else if (mode == 3) {
SetInputImage(thresholder_->GetPixNormRectGrey());
} else {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some considerations regarding where this should be placed in the code base:

  1. Using a separate entry point NormalizeImage called in ProcessPages instead of merely modifying the thresholder prevents applying this on any PSM other than full pages. And on the API, you would need to add NormalizeImage to the calling code instead of merely setting the configuration parameter.
  2. Recognition (SetupForRecognitionBestPix) does not always use pix_original_: after SetRectangle(), it uses pix_grey_ or even pix_binary_.
  3. Layout analysis mostly uses pix_binary_, but LineFinder also tries to use pix_grey_ and pix_thresholds_.
  4. DPI information (which influences LA in various ways) is taken from pix_ (i.e. the thresholder's SetImage), and that might not work on the output of Leptonica because the metadata might be lost. We still have fallback DPI estimation (which is based on the CC statistics from pix_binary_), but that might not be as accurate.

@JKamlah
Copy link
Contributor Author

JKamlah commented Jul 7, 2022

Thanks @amitdo and @bertsky for the great feedback.
I will try to optimize the current implementation design and add an option to switch between non-linear normalization and pixContrastNorm(), maybe with a parameter preprocess_graynorm_method.

@amitdo
Copy link
Collaborator

amitdo commented Apr 2, 2023

Tesseract's Otsu is implemented in src/ccstruct/otsuthr.cpp and src/ccstruct/otsuthr.h.

I suggest to move ImageThresholder::pixNLNorm() to a separate .cpp file and also add a separate .h file.

@JKamlah
Copy link
Contributor Author

JKamlah commented Apr 3, 2023

Thank you for the idea @amitdo.
I am sorry for not responding for so long. I will get back to you in the coming weeks (not before Easter) with a revised version. Maybe it will fit into the next Tesseract release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants