Skip to content
This repository has been archived by the owner on Jun 14, 2018. It is now read-only.

Libtesseract: need stress-testing #51

Open
jflesch opened this issue Dec 6, 2016 · 7 comments
Open

Libtesseract: need stress-testing #51

jflesch opened this issue Dec 6, 2016 · 7 comments

Comments

@jflesch
Copy link
Member

jflesch commented Dec 6, 2016

Someone has been reporting crashes of Paperwork when running the OCR. They are using Tesseract 3.04.01 .. so there may be something wrong with the libtesseract binding.

(Note: currently, the preference order has been changed so Pyocr uses tesseract-sh if possible)

@ghost
Copy link

ghost commented Mar 22, 2017

Getting occasional segfaults when using the pyocr.libtesseract tool. Can't pinpoint an exact repeatable cause. Will update if a pattern that triggers the segfault is found.

The other segfault occurs when there is no language data. This one is consistent.
screenshot from 2017-03-22 02-05-36

@jflesch
Copy link
Member Author

jflesch commented Mar 22, 2017

If you find a pattern, that would be awesome :-)

I note for the no-language crash. I'll have a look asap (probably this week-end I hope).

@jflesch
Copy link
Member Author

jflesch commented Mar 22, 2017

BTW, can you tell me which version of Tesseract you use please ?

@jflesch
Copy link
Member Author

jflesch commented Mar 22, 2017

no-language crash:

@ghost
Copy link

ghost commented Mar 22, 2017

Tesseract version is 3.04.01 from Ubuntu's 3.04.01-4build1

Thanks for the fix.

We lowered Mayan EDMS (http://www.mayan-edms.com) memory footprint by switching to pyocr's libtesseract, thanks for that too :)

@jflesch
Copy link
Member Author

jflesch commented Mar 22, 2017

You're welcome :)

@jflesch
Copy link
Member Author

jflesch commented May 13, 2017

Hm, maybe the crashes were due to a hack:
TessBaseAPIDetectOS() was actually a C++ function. I was using ctypes to access it .. and let just say it's not designed for C++, so it is/was a bit hacky. It may have been the cause of crashes on some systems.
Tesseract 3.05.00 included a new replacement function TessBaseAPIDetectOrientationScript() that is pure C. @aszlig added support for this new function.

I think I will try to switch libtesseract back as default once Tesseract 4 is out.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant