Skip to content
This repository has been archived by the owner on Jun 14, 2018. It is now read-only.

preserve_interword_spaces in tesseract #84

Open
anilnaik1988 opened this issue Nov 23, 2017 · 1 comment
Open

preserve_interword_spaces in tesseract #84

anilnaik1988 opened this issue Nov 23, 2017 · 1 comment
Labels

Comments

@anilnaik1988
Copy link

Hi Team, Currently i am using pyocr with tesseract 3.05.01. I am using pyocr.get_available_tools() to get tesseract. Is there any way i can preserve_interword_spaces for tesseract with help of pyocr.

@jflesch
Copy link
Member

jflesch commented Nov 23, 2017

Assuming you're using Tesseract (pyocr.tesseract) and not (pyocr.libtesseract) then yes, you can. You can make your own builder.
See DigitBuilder and the other builders for reference.
My suggestion: Inherit from TextBuilder and in the constructor, just after calling TextBuilder, set self.tesseract_flags and self.tesseract_configs as you need.
Then just pass your new builder to pyocr.tesseract.image_to_string() (aka pyocr.get_avalailable_tools()[0].image_to_string()), and you should get the expected result.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants