Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current state of Traditional Chinese support #11

Open
JanVeb opened this issue May 29, 2022 · 1 comment
Open

Current state of Traditional Chinese support #11

JanVeb opened this issue May 29, 2022 · 1 comment

Comments

@JanVeb
Copy link

JanVeb commented May 29, 2022

Hi there, this seems as just the thing I need, much better solution than other similar plugins with way more stars, great job.

Is it possible to translate to pinyin with numbers instead of tone marks? (If not, I could write you code for that, just asking, in case you didn't add this possibility but would like to add it as well, as often, in pinyin resources used for programing, pinyin is written with numbers, rather than tone marks, but this is really simple problem to solve, especially with output from your plugin)

Also I see in your example, its possible to translate traditional characters to simplified and vice versa, then In open issues you have user ShawTim on Dec 26, 2020 asking question:

Does segment support splitting Traditional Chinese into words? #8

Your answer:

It probably won't work very well for traditional characters because the segmentation library used (jieba) is trained on simplified texts. For now you'll probably have to convert to simplified first.

So I wanted to check, since its already passed more than a year since this question was asked, and there are examples in your project readme of translating traditional to simplified hanzi, did you add this latter on, is it working relatively good, or how good is it converting traditional characters to simplified ones?

I mean, I don't need your library for translating from traditional to simplified characters, but rather to translate traditional and simplified characters to pinyin, is your library good for translating traditional characters to pinyin now, since you have examples for translating traditional to Simplified characters and vice versa.

@peterolson
Copy link
Owner

You can convert between simplified and traditional, but the segmentation will only work well with simplified. If you want to segment traditional text, you can convert to simplified, segment, and then re-use the same segment lengths on the original traditional text.

But it would be a good idea to bake this in to the library to avoid this extra step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants