Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-22564 Java API for the host env to register a low level break eng… #2697

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

FrankYFTang
Copy link
Contributor

…ine to break CJ + Southeastern Asia script

Checklist
  • Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22564
  • Required: The PR title must be prefixed with a JIRA Issue number.
  • Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • Required: Each commit message must be prefixed with a JIRA Issue number.
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

@FrankYFTang
Copy link
Contributor Author

@richgillam Somehow my FakeYueBreakEngine break the PersonNameConsistencyTest on yue.txt and yue_Hans.txt . I am suprise that changes to BreakIterator will impact the outcome of PersonName. Could you explain how does PersonName depends on BreakIterator?

richgillam
richgillam previously approved these changes Nov 15, 2023
Copy link
Contributor

@richgillam richgillam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@richgillam
Copy link
Contributor

@richgillam Somehow my FakeYueBreakEngine break the PersonNameConsistencyTest on yue.txt and yue_Hans.txt . I am suprise that changes to BreakIterator will impact the outcome of PersonName. Could you explain how does PersonName depends on BreakIterator?

PersonName is using BreakIterator for a few things, as I remember it:

  • Some field modifiers capitalize each individual word in the field, using BreakIterator to find the word boundaries.
  • I think the initial-formation stuff uses BreakIterator to break a field into words so that it can convert each of them into an initial.
  • I could be wrong, but I think the space-replacement stuff may also use BreakIterator to figure out where to insert or substitute spaces.

The first two don't seem relevant to Yue, but the third one might be (if it's real-- I don't remember for sure now). Some of the foreign-name support might be triggering the capitalization/initial logic even though it's not strictly relevant (e.g., formatting Yue with an English person name formatter). If you can tell me more about which tests are failing, what they expect, and what you're getting instead, I can help you track the problem down (if you need help, of course).

Copy link
Contributor

@richgillam richgillam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that what we agreed to last week? I thought I came away from that discussion thinking we were adding a third method to say whether the break engine supported a locale, not turning the isFor() method into that method by getting rid of the c parameter. Do we no longer need an isFor() function that tests a character to see if it can begin a run of characters to be broken?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants