Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Sentence end defined incorrectly #5134

Open
ftonneau opened this issue Apr 1, 2024 · 9 comments
Open

[BUG] Sentence end defined incorrectly #5134

ftonneau opened this issue Apr 1, 2024 · 9 comments
Labels

Comments

@ftonneau
Copy link

ftonneau commented Apr 1, 2024

Version of Kakoune

Development version or current version on Arch

Reproducer

Write this in an empty buffer:

My name is A.B. Jones. Be my guest.

Position your cursor at line start (on "M"), then select an outer sentence with <a-a>s

Outcome

Kakoune selects "My name is A."

Expectations

Kakoune should select "My name is A.B. Jones. "

Additional information

From selectors.cc, Kakoune defines the end of a sentence as one of .;!? characters. This is incorrect in English as well as other Western languages. The end of a sentence is better defined as one of .;!? characters followed by one or two horizontal spaces or a line return. (A few corner cases could also be considered in English, as when the ending period is followed by a closing quote, but including space after .;!? would at least take care of the most common cases.)

@ftonneau ftonneau added the bug label Apr 1, 2024
@ftonneau
Copy link
Author

ftonneau commented Apr 1, 2024

Of course, the example should be:

My name is A.B.Jones. Be my guest.

(facepalm). I mentioned the lack of space requirement after .;!? on the Kakoune forum years ago, but never filled the bug report.

@ftonneau
Copy link
Author

ftonneau commented Apr 4, 2024

A better example (for real):

Philosophers (e.g., Fodor, 1975) and linguists (e.g., Chomsky, 1959) disagree.

Placing the cursor on P and extending to sentence end repeatedly results in 4 false stops (on e. ,g., e., g.) And we cannot even repeat the last object selection directly (with <a-.>), because at each stage we are stuck on the period. Instead, at each stage we need to extend the selection to the right a little bit before typing <a-.> and getting unstuck. Kakoune's support for sentence ending should definitely be improved.

@mawww
Copy link
Owner

mawww commented Apr 4, 2024

I agree this is something to be fixed, I'll try to dedicate a bit of time to that.

@Screwtapello
Copy link
Contributor

Tools like fmt require two spaces after a sentence, to disambiguate sentence breaks from abbreviations: you wouldn't want Paging Dr. Jones! to break after the "r", nor would you want to write Dr.Jones to make things work.

Unfortunately, in the modern era when typewriters have fallen out of fashion, most typing is done in proportionally-spaced contexts like this text-box, or Microsoft Word, or other tools that handle the whitespace characters for you, so nobody bothers to put two spaces at the end of a sentence anymore. In practice, there is no good way to detect the end of a sentence anymore, and the most reliable approximation is to bake a bunch of special-cases like "Dr." into the code which is inelegant.

I don't think Kakoune's "sentence end" selection is a buggy solution to a problem, I think it's a perfectly reasonable solution to a buggy problem.

@schragge
Copy link

schragge commented Apr 5, 2024

FYI, this is how sentence is defined in Vim's help (:h sentence):

A sentence is defined as ending at a '.', '!' or '?' followed by either the end of a line, or by a space or tab. Any number of closing ')', ']', '"' and ''' characters may appear after the '.', '!' or '?' before the spaces, tabs or end of line. A paragraph and section boundary is also a sentence boundary.
If the 'J' flag is present in 'cpoptions', at least two spaces have to follow the punctuation mark; <Tab>s are not recognized as white space.
The definition of a sentence cannot be changed.

This logic is implemented by Vim in function findsent.

@ftonneau
Copy link
Author

ftonneau commented Apr 5, 2024

My opening example was completely and stupidly messed up. My latter example is a better one:

Philosophers (e.g., Fodor, 1975) and linguists (e.g., Chomsky, 1959) disagree.

Here Kakoune will detect a sentence end at five different places, the first four ones being false positives because they involve a period not followed by a space.

It is true that no reasonable definition will eliminate all false positives (e.g., the period in Dr. Jones), but a definition such as Vim's is better than Kakoune's because contrary to the latter, Vim's definition eliminates more false positives.

@ftonneau
Copy link
Author

ftonneau commented Apr 5, 2024

Vim's definition also takes into account false negatives to the space-after-period rule such as a sentence "ending in quotes." IMHO, the best thing for Kakoune would be to follow Vim's (and Emacs') definition. This may involve a lot of effort or complication.

Edit: removed "but pending this, requiring punctuation to be followed by at least one space would already be an improvement on the current definition."

Thinking twice, the best thing would be either (a) to go all the way to a Vim-like definition, or (b) to leave the current source code as is, given that the sentence-end issue can be improved at the plugin level.

@kamurani
Copy link

@ftonneau just an FYI, a colon is a : character, and . is called a "period" or "full-stop"
Got a bit confused reading your comments.

@ftonneau
Copy link
Author

ftonneau commented Apr 29, 2024

You are right, thanks for correcting. I edited my posts accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants