Skip to content

Nexdata-AI/1044-Hours-Minnan-Dialect-Speech-Data-by-Mobile-Phone

Repository files navigation

1044-Hours-Minnan-Dialect-Speech-Data-by-Mobile-Phone

Description

Hokkien(China) Dialect Scripted Monologue Smartphone speech dataset, collected from monologue based on given prompts, covering short message and other 30+ customer consultation domains. Transcribed with text content, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(2496 people, which come from Quanzhou, Zhangzhou, Taiwan, Xiamen and other sourthern China districts), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied. For more details, please refer to the link: https://www.nexdata.ai/datasets/50?source=Github

Specifications

Format

16kHz, 16bit, wav, mono channel

Content category

Customer consultation (covering 30+ domains); short message

Recording condition

Low background noise (indoor)

Recording device

Smartphone; Android:iOS = 3:1

Country

China(CHN)

Language

Hokkien

Speaker

2,496 people; 55% females; 1,049 speakers are among 21-25 years old; speakers are from QuanZhou, ZhangZhou, TaiWan, XiaMen and other southern China districts

Features of annotation

Transcription text, gender, age, accent, noise

Licensing Information

Commercial License