Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge consecutive lines in a paragraph #60

Merged
merged 1 commit into from
Apr 30, 2015
Merged

Merge consecutive lines in a paragraph #60

merged 1 commit into from
Apr 30, 2015

Conversation

danvk
Copy link
Owner

@danvk danvk commented Apr 30, 2015

This uses some heuristics based on the font being fixed width. It would be better to do this using the bounding boxes from ocropus-gpageseg. (See #59)

This generally works pretty well. The one consistent problem is that, if the last line in a paragraph is short, it will remain on its own line.

Corresponding data update: oldnyc/oldnyc.github.io@045c946
Fixes #47

danvk added a commit that referenced this pull request Apr 30, 2015
Merge consecutive lines in a paragraph
@danvk danvk merged commit 7820834 into master Apr 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect wrapped lines
1 participant