Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting started with a custom dataset #34

Open
OhadCohen97 opened this issue Mar 21, 2023 · 4 comments
Open

Getting started with a custom dataset #34

OhadCohen97 opened this issue Mar 21, 2023 · 4 comments

Comments

@OhadCohen97
Copy link

OhadCohen97 commented Mar 21, 2023

Hi,

Thank you for your excellent work!

I want to use HTS-Audio-Transformer for my custom dataset, different classification task.

Are there any instructions on how to run the model for a different dataset? From which file should I start?

Thanks

@RetroCirce
Copy link
Owner

Hi, sorry for the late reply.
To use the model in different dataset, you need to construct a new dataset loader and dataset class, which you can refer from SEDDataset.

@OhadCohen97
Copy link
Author

OhadCohen97 commented Apr 3, 2023

Hi, thank you for your response.

Is it ok to use the SCV2_Dataset or DESED_Dataset? I see they are regular dataset classes, which are better for loading my standard audio WAV files. What is the difference from SEDDataset?

Can HTS-AT support multi-channel audio wav?

Thank you.

@RetroCirce
Copy link
Owner

Hi,

SCV2 is for speech command v2 dataset, and desed is for the sound even detection dataset, and ESC is for the ESC-50 dataset. I think SED dataset for SCV2 might be the best fit from which you can change it into your own dataset.

Yes, it is possible to support multi-channel audio, but first you might need to change the first layer to map more than one channel to the deep feature--> meaning that the pretrained model is no longer workable. Another way is that you can merge multi-channel into the single-channel, or performing the classification on multi-channel and take their average results.

@OhadCohen97
Copy link
Author

OhadCohen97 commented May 19, 2023

Hi,
Thank you for getting back to me!

  1. Regarding the multi-channel audio case, I have considered using Patch Embed to process each channel and then summing them up so that it can fit within the "forward_features" function. In your experience, Is there another approach that can be taken to establish a connection between the channels for better classification?

  2. Which of the hyperparameters in the 'config.py' file I need to consider in order to properly fine-tuning on my dataset (different classification task)?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants