Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Train POS with Locale Other Than English #45

Open
schrieveslaach opened this issue Jun 4, 2018 · 2 comments
Open

Cannot Train POS with Locale Other Than English #45

schrieveslaach opened this issue Jun 4, 2018 · 2 comments

Comments

@schrieveslaach
Copy link

If I train the arktweet POS tagger with e.g. German locale (cf. train method), the training process fails because it generates a file containing decimals with German formatting. For example, numbers like 0.2 are formatted as 0,2 (German notation) and the trainer component fails to load this file because of the comma.

@brendano
Copy link
Owner

brendano commented Dec 26, 2018

Hm. Maybe the locale has to be set as a java flag or property? Or it would be better to use non-locale-dependent, floating point number parsing code. I have no idea how to do that in java.

@schrieveslaach
Copy link
Author

I agree to use a locale independent format. For example, you could write number in English format.

public class I18NTester {
   public static void main(String[] args) {
      String pattern = "###.##";
      double number = 123.45;

      Locale enlocale  = new Locale("en", "US");
   
      DecimalFormat decimalFormat = (DecimalFormat) NumberFormat.getNumberInstance(enlocale);
      decimalFormat.applyPattern(pattern);

      System.out.println(decimalFormat.format(number));
   
      }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants