Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多音字数据库标准有些过于离谱 #293

Open
FavorMylikes opened this issue Jan 18, 2023 · 1 comment
Open

多音字数据库标准有些过于离谱 #293

FavorMylikes opened this issue Jan 18, 2023 · 1 comment
Labels

Comments

@FavorMylikes
Copy link

pinyin("能", strict=False, heteronym=True)
[['néng', 'tái', 'nái', 'nài', 'xióng']]

代码, pinyi_dict.py

    0x80FD: 'néng,tái,nái,nài,xióng',

新华字典数据库

只包含
'néng', 'nài'

汉字数据库

néng(neng2) , tái(tai2) , nái(nai2) , nài(nai4) , xióng(xiong2)

Wiki

néng(neng2) , tái(tai2) , tài(tai4) , tāi(tai4), nái(nai2) , nài(nai4) , xióng(xiong2)

  • 其中对xiong2的解释是,熊的替代写法
  • 差不多就是文言文中的借字,也有可能是文字尚未分化, 具体没有再查

建议

@mozillazg
Copy link
Owner

@FavorMylikes 感谢建议!原因详见 #263 这个 issue 中的讨论。

使用现行普通话标准的拼音数据库
或根据普通话 Mandarin粤语 Cantonese和其他方言在调用时提供参数

如果你有这几个数据库的文本化的数据的话,欢迎分享。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants