Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word segmentation algorithm should split words on all punctuation #312

Open
fingolfin opened this issue Aug 16, 2022 · 1 comment
Open
Labels
bug-report Unexpected behavior found, including behavior that diverges from documentation.

Comments

@fingolfin
Copy link

Try searching for "Burgh" or "Hume" on https://stork-search.net in the federalist papers. It shows zero matches. Yet on https://www.gutenberg.org/files/1404/1404-h/1404-h.htm#link2H_4_0016 it says

  1. Burgh's "Political Disquisitions."

and elsewhere

  1. Hume's Essays, Vol. I, p. 128: "The Rise of Arts and Sciences."

I would expect this to be found?

@fingolfin
Copy link
Author

fingolfin commented Aug 16, 2022

I also got zero hits searching for Abbé which occurs in l'Abbé but I am not sure whether that's the same issue or related to the accent?

@jameslittle230 jameslittle230 changed the title Problem finding stems occurring with a possessive apostrophe Word segmentation algorithm should split words on all punctuation Sep 30, 2022
@jameslittle230 jameslittle230 added improvement-request Request for new or enhanced behavior. bug-report Unexpected behavior found, including behavior that diverges from documentation. and removed improvement-request Request for new or enhanced behavior. labels Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-report Unexpected behavior found, including behavior that diverges from documentation.
Projects
Status: Todo
Development

No branches or pull requests

2 participants