Non-linear grayscale normalization for layout analyse and/or text recognition #3857

JKamlah · 2022-07-04T13:50:50Z

Non-linear grayscale normalization

Draft PR only

I would first like to use the draft PR option to get some feedback that the implementation of grayscale normalization works smoothly on a wide variety of templates and is proving beneficial. Please test extensively.

Image normalization

In some cases, image normalization is applied to improve LA and OCR results. A popular method is called nlbin, which is a non-linear grayscale normalization with the option of subsequent binarization. This method was developed by Thomas Breuel for the text recognition program Ocropus.

In this PR the nlbin method was adapted for the existing Leptonica functions. The method can be activated via the parameter for layout analysis and/or the actual text recognition. It only performs a grayscale normalization and then the existing binarization methods can be still applied to it.

The "preprocess_graynorm_mode" parameter

This parameter is an INT member with currently 4 modes and can be activated with "-c preprocess_graynorm_mode=INT".
The modes:
0=no normalization applied (default)
1= apply normalization for thresholding & recognition
2= apply normalization for thresholding (only)
3= apply normalization for recognition (only)

The modes 1-3 are applied on the fullimage.
A normalization on linelevel would also be desirable. (not implemented yet)

Additional option

With the parameter "-c tessedit_write_images=1" the normalized image can be written out as tiff.

…in (Thomas Breuel).

…tion and for both tasks.

…raynorm_mode). There are 4 modes 0 - no normalization, 1 - thresholding+recognition, 2 - thresholding (only), 3 - recognition (only).

amitdo · 2022-07-06T04:26:43Z

Hi @JKamlah,

Leptonica has some built-in grayscale normalization functions, maybe we can also use them.

https://github.com/DanBloomberg/leptonica/blob/0ffbc6822c23725b5b9f6876e2620a22ba3689f4/src/adaptmap.c

Here are some examples that demonstrate how to use them to improve thresholding using Otsu's or Sauvola's methods:

https://github.com/DanBloomberg/leptonica/blob/1297942d8b5c1a76abdde93ab4bbd5472870b937/src/binarize.c

I suggest to try to add at least pixContrastNorm() so it can later be followed by Sauvola.

amitdo · 2022-07-06T04:30:07Z

CC: @bertsky

amitdo · 2022-07-06T04:53:43Z

You can use this image for testing the new feature:

https://github.com/DanBloomberg/leptonica/blob/a14036fa5f5ea971/prog/w91frag.jpg

bertsky

Great stuff, many thanks for bringing this forward!

Good idea to provide a mode parameter where to apply the normalization. But I am afraid the current design does not always completely match the intended function:

bertsky · 2022-07-07T07:22:47Z

src/api/baseapi.cpp

+  if (mode == 1) {
+    SetInputImage(thresholder_->GetPixNormRectGrey());
+    thresholder_->SetImage(GetInputImage());
+  } else if (mode == 2) {
+    thresholder_->SetImage(thresholder_->GetPixNormRectGrey());
+  } else if (mode == 3) {
+    SetInputImage(thresholder_->GetPixNormRectGrey());
+  } else {
+    return false;
+  }


Some considerations regarding where this should be placed in the code base:

Using a separate entry point NormalizeImage called in ProcessPages instead of merely modifying the thresholder prevents applying this on any PSM other than full pages. And on the API, you would need to add NormalizeImage to the calling code instead of merely setting the configuration parameter.

Recognition (SetupForRecognition → BestPix) does not always use pix_original_: after SetRectangle(), it uses pix_grey_ or even pix_binary_.

Layout analysis mostly uses pix_binary_, but LineFinder also tries to use pix_grey_ and pix_thresholds_.

DPI information (which influences LA in various ways) is taken from pix_ (i.e. the thresholder's SetImage), and that might not work on the output of Leptonica because the metadata might be lost. We still have fallback DPI estimation (which is based on the CC statistics from pix_binary_), but that might not be as accurate.

JKamlah · 2022-07-07T08:31:54Z

Thanks @amitdo and @bertsky for the great feedback.
I will try to optimize the current implementation design and add an option to switch between non-linear normalization and pixContrastNorm(), maybe with a parameter preprocess_graynorm_method.

amitdo · 2023-04-02T11:50:00Z

Tesseract's Otsu is implemented in src/ccstruct/otsuthr.cpp and src/ccstruct/otsuthr.h.

I suggest to move ImageThresholder::pixNLNorm() to a separate .cpp file and also add a separate .h file.

JKamlah · 2023-04-03T08:28:14Z

Thank you for the idea @amitdo.
I am sorry for not responding for so long. I will get back to you in the coming weeks (not before Easter) with a revised version. Maybe it will fit into the next Tesseract release.

JKamlah added 6 commits April 4, 2022 17:41

Add: Non-linear grayscale normalization as preprocessing based on nlb…

ea5c9b7

…in (Thomas Breuel).

Add three normalization modi: Only for thresholding, only for recogni…

37462ac

…tion and for both tasks.

Reformat code.

a09b6b4

Add preprocessing parameter for grayscale normalization (preprocess_g…

6dfb216

…raynorm_mode). There are 4 modes 0 - no normalization, 1 - thresholding+recognition, 2 - thresholding (only), 3 - recognition (only).

Fix write preprocess image with tessedit_write_images.

18517a5

Fix error warning text, delete empty lines and old parameter config.

c049002

stweil mentioned this pull request Jul 5, 2022

tesseract does not recognize letters of good quality #3858

Open

bertsky reviewed Jul 7, 2022

View reviewed changes

Merge branch 'tesseract-ocr:main' into nlbin

5ebbb07

Merge branch 'tesseract-ocr:main' into nlbin

5fb2b62

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-linear grayscale normalization for layout analyse and/or text recognition #3857

Non-linear grayscale normalization for layout analyse and/or text recognition #3857

JKamlah commented Jul 4, 2022

amitdo commented Jul 6, 2022

amitdo commented Jul 6, 2022

amitdo commented Jul 6, 2022

bertsky left a comment

bertsky Jul 7, 2022

JKamlah commented Jul 7, 2022

amitdo commented Apr 2, 2023

JKamlah commented Apr 3, 2023

Non-linear grayscale normalization for layout analyse and/or text recognition #3857

Are you sure you want to change the base?

Non-linear grayscale normalization for layout analyse and/or text recognition #3857

Conversation

JKamlah commented Jul 4, 2022