-
-
Notifications
You must be signed in to change notification settings - Fork 840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proper Korean romanization #4182
Conversation
for the mb_* functions there is also a replacement available in If there are concerns about performs both variants can be benchmarked? however, not sure if it is worth the effort. |
I guess one optimization to make is to check first if the given string contains KOREAN_START at all and skip this code if not. This way any performance penalty does not apply to most strings. |
Also (since we don't speak korean) could you add some tests for common korean words and their correct romanization? These tests should fail with the current implementation and pass with yours. Existing korean tests need to adjusted of course. |
Ok, will add new Korean tests. Should I create separate Korean test like Japanese test? (Will fixing those existing test later, after I get proper PHPUnit environment) |
Something similar is fine, not sure is so many strings are needed. But see what works for you. Unit tests should help to show the intent of the changes, and helps ensuring that in future other changes will not undo these changes. |
I implemented dedicated Korean test, with most frequently used words provided by National Institute of the Korean Language. This implementation does spelling-based romanization, which is not official romanization of Korean(which uses pronounciation-based). However, pronounciation-based romanization is much more complicated and I think this relatively simple implementation works fine for the purpose. As far as I know, other project(OpenProject) also uses this kind of romanization, but I can't surely confirm. Note: 13fd8ef was created during resolve conflict in |
…n/dokuwiki into korean-romanization
Closing and replacing this PR with #4194, since this branch was compromised by local error. |
Currently, Korean romanization does not work properly, since
romanize()
inClean.php
does not handle full character. It only works when we type each component individually(ex. ㅌㅔㅅㅡㅌㅡ), which is virtually useless.So I added function for decomposing Korean characters and romanize them accordingly.
However, here's the catch.
strtr
.I thought this PR needs some discussion or review, so I opened as draft. (This code works though.)