Skip to content

DagsHub/audio-datasets

Repository files navigation

Open-source Audio Datasets

banner

What is DagsHub?

DagsHub is a centralized platform to host and manage machine learning projects including code, data, models, experiments, annotations, model registry, and more! DagsHub does the MLOps heavy lifting for its users. Every repository comes with configured S3 storage, an experiment tracking server, and an annotation workspace - all using popular open-source tools like MLflow, DVC, Git, and Label Studio.

What is Hacktoberfest?

Hacktoberfest is a month-long virtual festival of open source! Participants are giving back to the community by completing pull requests, participating in events, and donating to open-source projects. This project is part of Hacktoberfest 2023, where participants enrich the open-source audio datasets hosted on DagsHub.

Quick Start to Contribution

What does the DagsHub community contribute?

This year we'd like to focus our contribution on the audio domain. For that, we added audio data catalog capabilities to DagsHub! You can now upload audio files to DagsHub and see its spectrogram, wave, and even listen to it! You can see a vivid example of this (extremely cool) feature in our Librispeech-ASR-corpus project.

audio-catalog

To help audio practitioners leverage this new feature, we want to enrich open-source audio datasets on DagsHub. This is where you can contribute to the data science community!

How to contribute?

  • Claim the dataset you wish to contribute from the list (KUDOS to jim-schwoebel) by opening a new issue on the GitHub repository and name it after the dataset. Please make sure that the dataset wasn't claimed.
  • Open a new DagsHub repository and upload the data to its DVC storage (e.g., dataset repository).
  • Write information about the dataset in the README file (e.g., Librispeech ASR corpus README).
  • Add relevant tags to the repository and files.
  • Add the following labels to the repository:
    • dataset
    • audio
    • hacktoberfest
  • In the GitHub audio-datasets project:
    • Open a new branch named after the dataset.
    • Add a directory named after the dataset with the README file.
    • Commit and push the changes to GitHub.
    • Create a pull request on GitHub.
  • Optional: Share the project on DagsHub Hacktoberfest 2022 Discord channel.