Skip to content

Nexdata-AI/319977-Sentences-Mandarin-Polyphone-Corpus-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

319977-Sentences-Mandarin-Polyphone-Corpus-Data

Description

The Mandarin Polyphone Corpus Data is designed for polyphone disambiguation. It includes 603 common Mandarin pinyin pronunciations, There are differences in the number of phonetic corpora according to the number of phrases in a single word.

For more details, please refer to the link: https://www.nexdata.ai/datasets/1036?source=Github

Specifications

Data content

corpus for polyphone disambiguation.

Data size

including 603 Mandarin character-pinyin pairs and 319,977 sentences

Data source

including news and colloquial sentences

Annotation

annotating the Mandarin pinyin pronunciation of specific polyphone contained in the sentence

Language

Chinese

Application scenarios

speech synthesis

Accuracy

at a Character Accuracy Rate of 99%

Licensing Information

Commercial License