319977-Sentences-Mandarin-Polyphone-Corpus-Data

Description

The Mandarin Polyphone Corpus Data is designed for polyphone disambiguation. It includes 603 common Mandarin pinyin pronunciations, There are differences in the number of phonetic corpora according to the number of phrases in a single word.

For more details, please refer to the link: https://www.nexdata.ai/datasets/1036?source=Github

Specifications

Data content

corpus for polyphone disambiguation.

Data size

including 603 Mandarin character-pinyin pairs and 319,977 sentences

Data source

including news and colloquial sentences

Annotation

annotating the Mandarin pinyin pronunciation of specific polyphone contained in the sentence

Language

Chinese

Application scenarios

speech synthesis

Accuracy

at a Character Accuracy Rate of 99%

Licensing Information

Commercial License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

319977-Sentences-Mandarin-Polyphone-Corpus-Data

Description

Specifications

Data content

Data size

Data source

Annotation

Language

Application scenarios

Accuracy

Licensing Information

Files

README.md

Latest commit

History

README.md

File metadata and controls

319977-Sentences-Mandarin-Polyphone-Corpus-Data

Description

Specifications

Data content

Data size

Data source

Annotation

Language

Application scenarios

Accuracy

Licensing Information