Skip to content
This repository has been archived by the owner on Jun 14, 2018. It is now read-only.

Cuneiform: Split text areas before OCR #2

Open
jflesch opened this issue Apr 25, 2012 · 1 comment
Open

Cuneiform: Split text areas before OCR #2

jflesch opened this issue Apr 25, 2012 · 1 comment

Comments

@jflesch
Copy link
Member

jflesch commented Apr 25, 2012

Cuneiform tends to stop reading pages when it reachs a large non-readable area. Because of this, when using Cuneiform, all the keywords are not actually extracted.

A way to work around this problem would be to split the text areas prior to OCR.

For instance, unpaper can do that (ocrfeeder uses it).

@jflesch
Copy link
Member Author

jflesch commented Oct 26, 2016

Note : Image processing algorithm should be added in https://github.com/jflesch/libpillowfight/ , not in PyOCR directly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant