Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect text rotation without running recognition #3836

Closed
Balearica opened this issue Jun 7, 2022 · 9 comments · May be fixed by #4070
Closed

Detect text rotation without running recognition #3836

Balearica opened this issue Jun 7, 2022 · 9 comments · May be fixed by #4070

Comments

@Balearica
Copy link

As noted in the documentation , Tesseract performs poorly when the page is at an angle (not a multiple of 90 degrees). This limitation is not problematic from an accuracy standpoint, as Tesseract accurately reports the angle of text lines, so my existing pipeline rotates and re-runs recognition on any image where the angle is significant. However, this is computationally inefficient as there does not appear to be any way to get the page angle without also running recognition (despite estimating page angle/gradient being one of the first things calculated).

Therefore, it would be of significant benefit to be able to get the page angle without running the entire recognition process. I'll work on a build that does this myself--my initial thought is to add a config option that tells Tesseract to report the page angle and quit early (before recognition) if median line angle is above a user-defined threshold, however let me know if others have thoughts on implementation.

@zdenop
Copy link
Contributor

zdenop commented Jun 7, 2022

For such image prerocessing I would suggest to have a look at the leptonica programs/function examples) flipdetect_reg ,skewtest, skew_reg, and maybe dewarptest2...

Of course there are limitations (see e.g. issue 622), but they are fast and reliable for most of my cases...

IMHO such prepossessing should be done outside of tesseract.

@Balearica
Copy link
Author

Thanks for your response, I will review the Leptonica scripts linked before deciding how to implement.

@todd-richmond
Copy link

I found a much, must faster solution to detect page rotation. Call SetImage followed by DetectOrientationScript and then call

Pix *rotated = pixRotateOrth(pix, (360 - degree) / 90);

However, there is currently a bug that causes this to fail randomly so you need my short patch from #4062

@zdenop
Copy link
Contributor

zdenop commented Apr 27, 2023

@zdenop zdenop closed this as completed Apr 27, 2023
@todd-richmond
Copy link

https://github.com/DanBloomberg/leptonica/blob/0ffbc6822c23725b5b9f6876e2620a22ba3689f4/src/rotateorth.c#L64

That is the API to rotate an image, but not the API to detect if it is rotated. Tesseract docs and some StackOverflow comments recommend Recognize(), but that is extremely slow. On a sample tiff I used, it took .9 seconds for DetectOrientationScript vs 2.1 seconds for Recognize - when both were followed by 90 rotation and another Recognize to extra text

@amitdo
Copy link
Collaborator

amitdo commented Apr 28, 2023

@todd-richmond, you are talking about orientation detection: 0 / 90 / 180 / 270 degrees.

@Balearica is talking about a page with some parts that are skewed

@todd-richmond
Copy link

Never mind. I missed the "not" 90 when reading. De-skewing is much more challenging so we haven't bothered dealing with that for now

@amitdo
Copy link
Collaborator

amitdo commented May 7, 2023

@Balearica,

Did you try using AnalyseLayout()?

/**
* Runs page layout analysis in the mode set by SetPageSegMode.
* May optionally be called prior to Recognize to get access to just
* the page layout results. Returns an iterator to the results.
* If merge_similar_words is true, words are combined where suitable for use
* with a line recognizer. Use if you want to use AnalyseLayout to find the
* textlines, and then want to process textline fragments with an external
* line recognizer.
* Returns nullptr on error or an empty page.
* The returned iterator must be deleted after use.
* WARNING! This class points to data held within the TessBaseAPI class, and
* therefore can only be used while the TessBaseAPI class still exists and
* has not been subjected to a call of Init, SetImage, Recognize, Clear, End
* DetectOS, or anything else that changes the internal PAGE_RES.
*/
PageIterator *AnalyseLayout();
PageIterator *AnalyseLayout(bool merge_similar_words);

@Balearica
Copy link
Author

@amitdo I did not end up implementing this way, but do believe that running AnalyseLayout and then using the lines to re-calculate the average gradient would be another way to go about this.

I ended up creating a branch that allows for retrieving the number Tesseract already calculates, which I pushed to #4070. I think this is the most direct approach, and the only approach that does not involve redundant calculations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants