Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3em dash support in references #1012

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

3em dash support in references #1012

wants to merge 6 commits into from

Conversation

kermitt2
Copy link
Owner

Chicago reference style has this awful usage of 3em dash to repeat one or several, or all, authors of the previous reference. Although this practice seems to be removed or restricted by the the latest Chicago style guidelines, latex for instance still work with older guidelines and there are tons of back files with this style.

This PR tries to cover 3em dash in references (some training data for this has been added separately), when each three 3em dash sequence refers to one author. The case with one 3em dash sequence used to refer to all the previous authors is not covered, because it seems ambiguous with the case the first author is repeated.

However the real main problems with this crappy mechanism are with OCR, in particular older OCRized PDF. These dashes are never correctly recognized and the reconstruction of the author list becomes just impossible.

Some example:

Screenshot from 2023-05-13 13-34-45

Screenshot from 2023-05-13 13-35-29

With numbers:

Screenshot from 2023-05-13 13-35-47

And finally three 3em dash to repeat all the authors of the previous reference (not just the first!).

Screenshot from 2023-05-13 13-34-19

@kermitt2 kermitt2 marked this pull request as draft December 18, 2023 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant