Skip to content
This repository has been archived by the owner on Jun 14, 2018. It is now read-only.

Using libtesseract on Windows #90

Open
ghost opened this issue Jan 11, 2018 · 3 comments
Open

Using libtesseract on Windows #90

ghost opened this issue Jan 11, 2018 · 3 comments
Labels

Comments

@ghost
Copy link

ghost commented Jan 11, 2018

I tried to use libtesseract302.dll (from https://github.com/mnadeem/ocr-tess4j-example), but

AttributeError: function 'TessBaseAPIGetDatapath' not found

then I tried to use libtesseract400.dll (from https://github.com/nguyenq/tess4j which depends on https://github.com/nguyenq/lept4j)

but it seems that libtesseract400.dll not in libtesseract.tesseract_raw.libnames

by the way, ctypes.cdll.LoadLibrary will search dll from environment variable PATH on Windows at least

# Jflesch> Don't they have the equivalent of LD_LIBRARY_PATH on

I think it's easy to fix, but why not pack with libtesseract, maybe this will make it easier to use

@jflesch
Copy link
Member

jflesch commented Jan 11, 2018

I tried to use libtesseract302.dll (from https://github.com/mnadeem/ocr-tess4j-example)

  1. Windows support for libtesseract is based on contributions. I personally don't use it (I use pyocr.tesseract for my project on Windows). So the list of .dll to try to load is probably not up-to-date at all. Please don't hesitate to tell me if you need some new ones to be added.

  2. Tesseract 3.02 is known for not working well with Pyocr (on GNU/Linux anyway). Even if the binding did work, is_available() would have return false. You should try with Tesseract >= 3.0.4.

  3. I don't know where those repositories come from, but they seem intended to be use with tess4j (Java) (are they patched specifically for tess4j ?). Anyway, I think you should use some more official/direct sources for your Tesseract installation: https://github.com/tesseract-ocr/tesseract/wiki/Downloads ; https://github.com/tesseract-ocr/tesseract/wiki/Data-Files

  4. AFAIK, Tesseract 4 is still in alpha. Pyocr supports it on Linux, but I cannot guarantee yet a good support on Windows at all.

I think it's easy to fix, but why not pack with libtesseract, maybe this will make it easier to use

Because if we go this way, for consistency, I would have to package also Tesseract.exe, Cuneiform, and data language files of both Tesseract and Cuneiform.

@ghost
Copy link
Author

ghost commented Jan 12, 2018

Thank you very much for granting me so much of your valuable time.

I don't know where those repositories come from

I just too lazy to complie libtesseract by myself, and search from github...

I try to use (3rd party - @parrot-office) in https://github.com/tesseract-ocr/tesseract/wiki/Downloads for win32, but it should use with many pvt.cppan.demo.xxx.dll
_(:з」∠)_ maybe I should try to complie...

Please don't hesitate to tell me if you need some new ones to be added.

these names maybe can be added:

libtesseract304.dll
libtesseract305.dll
libtesseract400.dll
libtesseract.dll

@jflesch
Copy link
Member

jflesch commented Jan 12, 2018

these names maybe can be added:

Done: 2d6ead7

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant