319977-Sentences-Mandarin-Polyphone-Corpus-Data

Description

The Mandarin Polyphone Corpus Data is designed for polyphone disambiguation. It includes 603 common Mandarin pinyin pronunciations, There are differences in the number of phonetic corpora according to the number of phrases in a single word.

For more details, please refer to the link: https://www.nexdata.ai/datasets/1036?source=Github

Specifications

Data content

corpus for polyphone disambiguation.

Data size

including 603 Mandarin character-pinyin pairs and 319,977 sentences

Data source

including news and colloquial sentences

Annotation

annotating the Mandarin pinyin pronunciation of specific polyphone contained in the sentence

Language

Chinese

Application scenarios

speech synthesis

Accuracy

at a Character Accuracy Rate of 99%

Licensing Information

Commercial License

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.DS_Store		.DS_Store
5C6AE796-AF30-46AC-ABCC-B698830A2626.png		5C6AE796-AF30-46AC-ABCC-B698830A2626.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.DS_Store

.DS_Store

5C6AE796-AF30-46AC-ABCC-B698830A2626.png

5C6AE796-AF30-46AC-ABCC-B698830A2626.png

README.md

README.md

Repository files navigation

319977-Sentences-Mandarin-Polyphone-Corpus-Data

Description

Specifications

Data content

Data size

Data source

Annotation

Language

Application scenarios

Accuracy

Licensing Information

About

Releases

Packages

Nexdata-AI/319977-Sentences-Mandarin-Polyphone-Corpus-Data

Folders and files

Latest commit

History

Repository files navigation

319977-Sentences-Mandarin-Polyphone-Corpus-Data

Description

Specifications

Data content

Data size

Data source

Annotation

Language

Application scenarios

Accuracy

Licensing Information

About

Topics

Resources

Stars

Watchers

Forks