1044-Hours-Minnan-Dialect-Speech-Data-by-Mobile-Phone

Description

Hokkien(China) Dialect Scripted Monologue Smartphone speech dataset, collected from monologue based on given prompts, covering short message and other 30+ customer consultation domains. Transcribed with text content, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(2496 people, which come from Quanzhou, Zhangzhou, Taiwan, Xiamen and other sourthern China districts), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied. For more details, please refer to the link: https://www.nexdata.ai/datasets/50?source=Github

Specifications

Format

16kHz, 16bit, wav, mono channel

Content category

Customer consultation (covering 30+ domains); short message

Recording condition

Low background noise (indoor)

Recording device

Smartphone; Android:iOS = 3:1

Country

China(CHN)

Language

Hokkien

Speaker

2,496 people; 55% females; 1,049 speakers are among 21-25 years old; speakers are from QuanZhou, ZhangZhou, TaiWan, XiaMen and other southern China districts

Features of annotation

Transcription text, gender, age, accent, noise

Licensing Information

Commercial License

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
T0104G0016S0001.txt		T0104G0016S0001.txt
T0104G0016S0001.wav		T0104G0016S0001.wav
T0104G0160S0001.txt		T0104G0160S0001.txt
T0104G0160S0001.wav		T0104G0160S0001.wav
T0104G0339S0011.txt		T0104G0339S0011.txt
T0104G0339S0011.wav		T0104G0339S0011.wav
T0104G0471S0002.txt		T0104G0471S0002.txt
T0104G0471S0002.wav		T0104G0471S0002.wav
T0104G0487S0001.txt		T0104G0487S0001.txt
T0104G0487S0001.wav		T0104G0487S0001.wav
T0104G0489S0001.txt		T0104G0489S0001.txt
T0104G0489S0001.wav		T0104G0489S0001.wav
T0104G0503S0005.txt		T0104G0503S0005.txt
T0104G0503S0005.wav		T0104G0503S0005.wav
T0105G0015S0001.txt		T0105G0015S0001.txt
T0105G0015S0001.wav		T0105G0015S0001.wav
T0105G0022S0001.txt		T0105G0022S0001.txt
T0105G0022S0001.wav		T0105G0022S0001.wav
T0105G0034S0001.txt		T0105G0034S0001.txt
T0105G0034S0001.wav		T0105G0034S0001.wav

Nexdata-AI/1044-Hours-Minnan-Dialect-Speech-Data-by-Mobile-Phone

Folders and files

Latest commit

History

Repository files navigation

1044-Hours-Minnan-Dialect-Speech-Data-by-Mobile-Phone

Description

Specifications

Format

Content category

Recording condition

Recording device

Country

Language

Speaker

Features of annotation

Licensing Information

About

Topics

Resources

Stars

Watchers

Forks