Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8330590: TextInputControl: previous word fails with Bhojpuri characters #1444

Conversation

andy-goryachev-oracle
Copy link
Contributor

@andy-goryachev-oracle andy-goryachev-oracle commented Apr 19, 2024

This change replaces Character.isLetterOrDigit(char) which fails with surrogate characters with Character.isLetterOrDigit(int).


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8330590: TextInputControl: previous word fails with Bhojpuri characters (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jfx.git pull/1444/head:pull/1444
$ git checkout pull/1444

Update a local copy of the PR:
$ git checkout pull/1444
$ git pull https://git.openjdk.org/jfx.git pull/1444/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1444

View PR using the GUI difftool:
$ git pr show -t 1444

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jfx/pull/1444.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 19, 2024

👋 Welcome back angorya! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 19, 2024

@andy-goryachev-oracle This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8330590: TextInputControl: previous word fails with Bhojpuri characters

Reviewed-by: kpk, arapte

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Ready for review label Apr 19, 2024
@mlbridge
Copy link

mlbridge bot commented Apr 19, 2024

Webrevs

@kevinrushforth
Copy link
Member

@karthikpandelu Can you review this? We'll also need a review by a "R"eviewer.

Copy link
Member

@karthikpandelu karthikpandelu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Validated the changes manually and using the unit tests.
The new unit test fails without the fix and passes with the fix provided in this PR as expected.

I have one query though, while moving from left to write by word (using option + RIGHT) for the same text, after the first word "Bhojpuri", it goes to the beginning of the Bhojpuri text and then to the end of the Bhojpuri text. But if it was all English text, then it directly goes to the end of the text and skips space. Is this expected?

@andy-goryachev-oracle
Copy link
Contributor Author

Is this expected?

I think it might be a bug - even though it's unclear how many words the text "𑂦𑂷𑂔𑂣𑂳𑂩𑂲" contains, I would not expect it to go to the beginning of that segment.

I suspect the code in TextInputControl.endOfNextWord(boolean) is incorrect, and it needs a deeper re-write than the naive replacement with isLetterOrDigit().

@andy-goryachev-oracle
Copy link
Contributor Author

I think we need to fix endOf/nextWord as well, as the logic seems to be breaking with the surrogate pairs:

Screenshot 2024-04-29 at 09 48 33

The issue can also be seen with Awadhi: अवधी/औधी

@andy-goryachev-oracle
Copy link
Contributor Author

Looking at the "next word" functionality across different applications on different platforms, it appears to be a wide variety of behaviors.

One vendor appears to be quite consistent - Microsoft. Its word, word pad, notepad work exactly the same, with Word working the same across macOS and Win11.

JavaFX TextArea is inconsistent (by design) between macOS and Win11, but also is inconsistent with Swing's JTextArea.

If I were to fix the behavior (if we decide to fix the behavior of the nextWord function, that is), I would make it consistent with MS Word, but let's discuss.

For reference, here is the result of my testing. Initially, the caret is placed at index 0 and the numbers in parentheses denote successive caret positions after ctrl-RIGHT (option-RIGHT) key presses. An underline denotes a space, and a (nl) denotes a newline.

source
_english_english_eng:_end,_eng:_(nl)
(nl)
_eng


BreakIterator.getWordInstance()
_(1)english(2)_(3)english(4)_(5)eng(6):(7)_(8)end(9),(10)_(11)eng(12):(13)_(14)(nl)
(15)(nl)
(16)_(17)eng


text area (mac)
_english(1)_english(2)_eng(3):(4)_end(5),(6)_eng(7):(8)_(nl)
(9)(nl)
(10)_eng(11)


ms word (mac) 16.84 24041420 consistent with win11
_(1)english_(2)english_(3)eng(4):_(5)end(6),_(7)eng(8):_(9)(nl)
(10)(nl)
(11)_(12)eng(13)


text edit (mac)
_english(1)_english(2)_eng(3):_end(4),_eng(5):_(nl)
(nl)
(nl)_eng(6)


chrome (mac) <div contenteditable=true>
&nbsp;english(1)_english(2)_eng(3):(4)_end(5),(6)_eng(7):(8)_<br>
(9)<br>
_(10)eng(11)


eclipse (mac)
_(1)english_(2)english_(3)eng(4):_(5)end(6),_(7)eng(8):_(9)(nl)
(10)(nl)
(11)_(12)eng


JTextArea (mac)
_(1)english_(2)english_(3)eng(4):_(5)end(6),_(7)eng(8):_(9)(nl)
(nl)
_(10)eng


ms word 365 ver 2302 build 16.0.16130.20942 (win 11)
same as notepad (win 11)
same as wordpad (win 11)
_(1)english_(2)english_(3)eng(4):_(5)end(6),_(7)eng(8):_(9)(nl)
(10)(nl)
(11)_(12)eng


TextArea (win11)
_(1)english_(2)english_(3)eng(4):_(5)end(6),_(7)eng(8):_(9)(nl)
(10)(nl)
_(11)eng

@andy-goryachev-oracle
Copy link
Contributor Author

@aghaisas would you please take a look at this also?

@karthikpandelu
Copy link
Member

If I were to fix the behavior (if we decide to fix the behavior of the nextWord function, that is), I would make it consistent with MS Word, but let's discuss.

The behaviour in MS word looks to be easy to understand and what we would expect. +1 for this.

Thanks @andy-goryachev-oracle for checking the behaviour and providing the details.

@andy-goryachev-oracle
Copy link
Contributor Author

thank you @karthikpandelu for raising the question!

@andy-goryachev-oracle
Copy link
Contributor Author

I've created https://bugs.openjdk.org/browse/JDK-8331951 to deal with the "next word" function issues.

Copy link
Member

@arapte arapte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix looks good, providing minor comments.

Comment on lines +1746 to +1751
if (ix < 0) {
// should not happen
return false;
} else if (ix >= text.length()) {
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be combine them into single if statement.
Or may be remove the checks as this is a private method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep one statement per line; the checks are needed here.

@@ -1743,4 +1742,15 @@ public int getAnchor() {
}
}

private static boolean isLetterOrDigit(String text, int ix, int len) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The len variable is unused in this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thank you

@openjdk openjdk bot added the ready Ready to be integrated label May 17, 2024
@andy-goryachev-oracle
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented May 20, 2024

Going to push as commit b5fe362.
Since your change was applied there has been 1 commit pushed to the master branch:

  • 9dc4aa2: 8324327: ColorPicker shows a white rectangle on clicking on picker

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 20, 2024
@openjdk openjdk bot closed this May 20, 2024
@openjdk openjdk bot removed ready Ready to be integrated rfr Ready for review labels May 20, 2024
@openjdk
Copy link

openjdk bot commented May 20, 2024

@andy-goryachev-oracle Pushed as commit b5fe362.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@andy-goryachev-oracle andy-goryachev-oracle deleted the 8330590.prev.word branch May 20, 2024 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated
4 participants