Skip to content

Latest commit

 

History

History
41 lines (36 loc) · 3.39 KB

README.md

File metadata and controls

41 lines (36 loc) · 3.39 KB

dataloaders

Pytorch and TFRecords data loaders for several audio datasets

Datasets

  1. ESC - dataset of environmental sounds
  1. LibriSpeech - corpus of read English speech
  1. NSynth - dataset of annotated musical notes
  1. VoxCeleb2 - human speech, extracted from YouTube interview videos
  • Pytorch loader
  • TFRecords loader
  1. GTZAN - audio tracks from a variety of sources annotated with genre class
  1. CallCenter - audio tracks with human and non-human speech

For validation we frequently use the following scheme:

  1. Read 10 random crops from a file;
  2. Predict a class for each crop;
  3. Averaging results.

For this scheme we've done additional DataLoaders for PyTorch: