Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve layout detection #11

Open
alexanderadam opened this issue Jan 31, 2021 · 1 comment
Open

Improve layout detection #11

alexanderadam opened this issue Jan 31, 2021 · 1 comment
Labels
enhancement New feature or request

Comments

@alexanderadam
Copy link

alexanderadam commented Jan 31, 2021

First of all: pd3f is working great!
It's a wonderful tool. Thank you so much for creating it.

There's this small issue though, that text blocks / columns aren't recognized as such. So articles written in columns and similar things are currently not recognized within their blocks.
Thus highlighting or searching things that span over a line is broken in these cases.

I'm not sure why this is the case, since I though that tesseract has actually a proper layout analysis integrated ("Page Segmentation Mode" and its default should be "Fully automatic page segmentation, but no OSD").

@jfilter
Copy link
Member

jfilter commented Jan 31, 2021

Hey Alexander,

I'm unfortunately very busy until March. I'm also not satisfied of the column detection and will improve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants