Skip to content

Latest commit

 

History

History
26 lines (21 loc) · 872 Bytes

File metadata and controls

26 lines (21 loc) · 872 Bytes

319977-Sentences-Mandarin-Polyphone-Corpus-Data

Description

The Mandarin Polyphone Corpus Data is designed for polyphone disambiguation. It includes 603 common Mandarin pinyin pronunciations, There are differences in the number of phonetic corpora according to the number of phrases in a single word.

For more details, please refer to the link: https://www.nexdata.ai/datasets/1036?source=Github

Specifications

Data content

corpus for polyphone disambiguation.

Data size

including 603 Mandarin character-pinyin pairs and 319,977 sentences

Data source

including news and colloquial sentences

Annotation

annotating the Mandarin pinyin pronunciation of specific polyphone contained in the sentence

Language

Chinese

Application scenarios

speech synthesis

Accuracy

at a Character Accuracy Rate of 99%

Licensing Information

Commercial License