The Mandarin Polyphone Corpus Data is designed for polyphone disambiguation. It includes 603 common Mandarin pinyin pronunciations, There are differences in the number of phonetic corpora according to the number of phrases in a single word.
For more details, please refer to the link: https://www.nexdata.ai/datasets/1036?source=Github
corpus for polyphone disambiguation.
including 603 Mandarin character-pinyin pairs and 319,977 sentences
including news and colloquial sentences
annotating the Mandarin pinyin pronunciation of specific polyphone contained in the sentence
Chinese
speech synthesis
at a Character Accuracy Rate of 99%
Commercial License