-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
「湯」默認讀音係棄用音 #5
Comments
因為仲用緊發音優先順序幾乎係 random 嘅舊版詞表,冇人得閒整…… |
我用粵典嘅33,043句例句發音嚟test咗而家嘅ToJyutping,測試結果Syllable Error Rate係7.33%。我覺得可以通過分析error嘅類型嚟提升準確率,同埋呢個test set可以作為一個regression test,以避免將來update詞表或者改變排序算法而引發新問題。我可以新建個repo叫類似to-jyutping-tests,將而家所有粵典同ToJyutping標注相同嘅句子作為regression test嘅基礎,測試下你嘅PR有冇break之前通過嘅句子。Python同JS嘅版本未來都可以reference同一個test。 我粗略睇咗下test結果,總結出大致6類error:
|
@graphemecluster I added a draft PR #6 which addresses some of the most pressing issues. See the PR for a summary of the improvements. |
>>> ToJyutping.get_jyutping_text("湯")
'soeng1
上游標識soeng1係「湯」字嘅棄用音,唔知點解ToJyutping會默認輸出soeng1?
https://github.com/CanCLID/rime-cantonese-upstream/blob/ba155365c8671ca51848224dec933d5b91091d05/char.csv#L17524C1-L17528C1
The text was updated successfully, but these errors were encountered: