Skip to content
This repository has been archived by the owner on Jun 14, 2018. It is now read-only.

Expose load_system_dawg for non textual ouput #61

Open
awiebe opened this issue Apr 7, 2017 · 3 comments
Open

Expose load_system_dawg for non textual ouput #61

awiebe opened this issue Apr 7, 2017 · 3 comments

Comments

@awiebe
Copy link

awiebe commented Apr 7, 2017

--load_system_dawg 0 would be helpful as an argument in image_to_text, perhaps as an options dictionary. Feel free to call it something that makes it language agnostic

@jflesch
Copy link
Member

jflesch commented Apr 7, 2017

You can simply create a builder object yourself. You can have a look at https://github.com/jflesch/pyocr/blob/master/src/pyocr/tesseract.py#L57 for an example. Basically you just need to inherit from BaseBuilder and define tess_conf = ["--load_system_dawg", "0"], file_ext = ['the_file_extension_that_tesseract_will_use'], and the methods read_file() and write_file().

If you implement such builder, feel free to send a pull request to include it in src/pyocr/tesseract.py.

@outkaj
Copy link

outkaj commented Jun 30, 2017

I implemented a similar builder here, if it's helpful. In my case, I needed a modification to WordBoxBuilder with dictionary-related parameters set to false.

This is a work in progress, since I may modify the parameters further - once it's complete, I'm happy to submit the builder as a pull request.

@gamykla
Copy link

gamykla commented Oct 21, 2017

Would be great if you could just override tess_conf without having to extend the base builder

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants