Detect text rotation without running recognition #3836

Balearica · 2022-06-07T03:03:54Z

As noted in the documentation , Tesseract performs poorly when the page is at an angle (not a multiple of 90 degrees). This limitation is not problematic from an accuracy standpoint, as Tesseract accurately reports the angle of text lines, so my existing pipeline rotates and re-runs recognition on any image where the angle is significant. However, this is computationally inefficient as there does not appear to be any way to get the page angle without also running recognition (despite estimating page angle/gradient being one of the first things calculated).

Therefore, it would be of significant benefit to be able to get the page angle without running the entire recognition process. I'll work on a build that does this myself--my initial thought is to add a config option that tells Tesseract to report the page angle and quit early (before recognition) if median line angle is above a user-defined threshold, however let me know if others have thoughts on implementation.

zdenop · 2022-06-07T05:43:18Z

For such image prerocessing I would suggest to have a look at the leptonica programs/function examples) flipdetect_reg ,skewtest, skew_reg, and maybe dewarptest2...

Of course there are limitations (see e.g. issue 622), but they are fast and reliable for most of my cases...

IMHO such prepossessing should be done outside of tesseract.

Balearica · 2022-06-08T05:11:38Z

Thanks for your response, I will review the Leptonica scripts linked before deciding how to implement.

todd-richmond · 2023-04-27T05:18:10Z

I found a much, must faster solution to detect page rotation. Call SetImage followed by DetectOrientationScript and then call

Pix *rotated = pixRotateOrth(pix, (360 - degree) / 90);

However, there is currently a bug that causes this to fail randomly so you need my short patch from #4062

zdenop · 2023-04-27T05:36:48Z

It is here:
https://github.com/DanBloomberg/leptonica/blob/0ffbc6822c23725b5b9f6876e2620a22ba3689f4/src/rotateorth.c#L64

todd-richmond · 2023-04-27T16:26:31Z

https://github.com/DanBloomberg/leptonica/blob/0ffbc6822c23725b5b9f6876e2620a22ba3689f4/src/rotateorth.c#L64

That is the API to rotate an image, but not the API to detect if it is rotated. Tesseract docs and some StackOverflow comments recommend Recognize(), but that is extremely slow. On a sample tiff I used, it took .9 seconds for DetectOrientationScript vs 2.1 seconds for Recognize - when both were followed by 90 rotation and another Recognize to extra text

amitdo · 2023-04-28T08:34:28Z

@todd-richmond, you are talking about orientation detection: 0 / 90 / 180 / 270 degrees.

@Balearica is talking about a page with some parts that are skewed

todd-richmond · 2023-04-28T16:13:45Z

Never mind. I missed the "not" 90 when reading. De-skewing is much more challenging so we haven't bothered dealing with that for now

amitdo · 2023-05-07T09:47:12Z

@Balearica,

Did you try using AnalyseLayout()?

tesseract/include/tesseract/baseapi.h

Lines 433 to 449 in bf7c134

    
             /** 
        
              * Runs page layout analysis in the mode set by SetPageSegMode. 
        
              * May optionally be called prior to Recognize to get access to just 
        
              * the page layout results. Returns an iterator to the results. 
        
              * If merge_similar_words is true, words are combined where suitable for use 
        
              * with a line recognizer. Use if you want to use AnalyseLayout to find the 
        
              * textlines, and then want to process textline fragments with an external 
        
              * line recognizer. 
        
              * Returns nullptr on error or an empty page. 
        
              * The returned iterator must be deleted after use. 
        
              * WARNING! This class points to data held within the TessBaseAPI class, and 
        
              * therefore can only be used while the TessBaseAPI class still exists and 
        
              * has not been subjected to a call of Init, SetImage, Recognize, Clear, End 
        
              * DetectOS, or anything else that changes the internal PAGE_RES. 
        
              */ 
        
             PageIterator *AnalyseLayout(); 
        
             PageIterator *AnalyseLayout(bool merge_similar_words);

Balearica · 2023-05-09T04:01:15Z

@amitdo I did not end up implementing this way, but do believe that running AnalyseLayout and then using the lines to re-calculate the average gradient would be another way to go about this.

I ended up creating a branch that allows for retrieving the number Tesseract already calculates, which I pushed to #4070. I think this is the most direct approach, and the only approach that does not involve redundant calculations.

amitdo added the feature request label Jun 23, 2022

zdenop closed this as completed Apr 27, 2023

Balearica mentioned this issue May 9, 2023

Allow for text angle/gradient to be retrieved #4070

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect text rotation without running recognition #3836

Detect text rotation without running recognition #3836

Balearica commented Jun 7, 2022

zdenop commented Jun 7, 2022

Balearica commented Jun 8, 2022

todd-richmond commented Apr 27, 2023

zdenop commented Apr 27, 2023

todd-richmond commented Apr 27, 2023

amitdo commented Apr 28, 2023 •

edited

todd-richmond commented Apr 28, 2023

amitdo commented May 7, 2023

Balearica commented May 9, 2023

Detect text rotation without running recognition #3836

Detect text rotation without running recognition #3836

Comments

Balearica commented Jun 7, 2022

zdenop commented Jun 7, 2022

Balearica commented Jun 8, 2022

todd-richmond commented Apr 27, 2023

zdenop commented Apr 27, 2023

todd-richmond commented Apr 27, 2023

amitdo commented Apr 28, 2023 • edited

todd-richmond commented Apr 28, 2023

amitdo commented May 7, 2023

Balearica commented May 9, 2023

amitdo commented Apr 28, 2023 •

edited