You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I'm considering using Lingua in a project. It compares favourably to CLD2 and Whatlang on our dataset (social media posts), but one of our requirements is that we need to distinguish between traditional and simplified chinese, which Lingua does not support.
Are there any plans to support this? Our requirement for Chinese support probably won't be crucial until later in the year, so if support is in development that would go a long way.
Thanks!
The text was updated successfully, but these errors were encountered:
Back then, when I added support for Chinese, I did not find proper training corpora that consisted only of traditional or simplified Chinese, respectively. That's why Lingua cannot differentiate between them yet. Do you know of a good source for training material perhaps? I can start a search myself again as well. If successful, adding support for traditional and simplified Chinese won't be difficult anymore.
No, I don't know of any specific training corpora -- but it's my understanding that traditional vs simplified chinese typically have different character sets, so it may be possible to distinguish them without a ML model. In fact, perhaps I could use such a thing...
OpenCC is supported on Linux only which makes it a non-feasible solution for my library. And don't forget that there could be foreign language material in Chinese texts. So I think we won't get around creating ML models. I will try to find some good training data again, perhaps I will be more lucky this time.
Hi! I'm considering using Lingua in a project. It compares favourably to CLD2 and Whatlang on our dataset (social media posts), but one of our requirements is that we need to distinguish between traditional and simplified chinese, which Lingua does not support.
Are there any plans to support this? Our requirement for Chinese support probably won't be crucial until later in the year, so if support is in development that would go a long way.
Thanks!
The text was updated successfully, but these errors were encountered: