Skip to content

Nexdata-AI/189-Hours-Latin-American-Spanish-Children-Spontaneous-Speech-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

189-Hours-Latin-American-Spanish-Children-Spontaneous-Speech-Data

Description

The 189 Hours - Latin American Spanish Child's Spontaneous Speech Data is a collection of speech clips, the content covering multiple topics. All the speech audio was manually transcribed into text content; speaker identity, gender, and other attribution are also annotated. This dataset can be used for voiceprint recognition model training, corpus construction for machine translation, and algorithm research introduction

For more details, please refer to the link: https://www.nexdata.ai/datasets/1250?source=Github

Specifications

Format

16kHz, 16bit, mono channel;

age

children aged 12 and under

Content category

including interview, self-meida,variety show, etc.

Language

Latin American Spanish;

Annotation

annotation for the transcription text, speaker identification, gender;

Application scenarios

speech recognition, video caption generation and video content review;

Accuracy

at a Word Accuracy Rate (SAR) of being no less than 98%.

Licensing Information

Commercial License