How was the ground truth in the article be set? How to get it? #16

Ko-vey · 2022-05-17T09:09:06Z

How was the ground truth in the article be set? How to get it?

RicherMans · 2022-05-17T12:09:52Z

Sorry can you explain exactly what you refer to as the ground truth?

RicherMans · 2022-05-17T13:26:25Z

Its from the DCASE 2018 and 2019 datasets, they strongly labeled their evaluation datasets.

Ko-vey · 2022-05-17T13:27:26Z

Sorry can you explain exactly what you refer to as the ground truth?

Thanks for your brilliant work and patience ! There are a few questiones I want to know:

Like the pictures shown above, how the ground truth label of speech activation period was set for evaluation and comparison?
How do we know the performance of a student model trained with the help of teacher model( t1 or t2 ) without exact frame-level label on a new dataset?
I am recently working on birdcall activation detection task based on your model, but by replacing the speech label with birdcall label in teacher-student approach on Audioset balanced subset, the new student model seemed to learn nothing from t1 and performed poorly on bird audio file. Could you give some advice on how to train a proper model ?

RicherMans · 2022-05-17T13:36:58Z

Oh hey, yeah no problem with these questions:

Like the pictures shown above, how the ground truth label of speech activation period was set for evaluation and comparison?

Its manually labeled by the DCASE authors, nothing special here, its not predicted by any of my models. All these datasets are publicly available here.

How do we know the performance of a student model trained with the help of teacher model( t1 or t2 ) without exact frame-level label on a new dataset?

I mean you can use some external dataset for cross-validation during training. I forgot what I did use for validation or if I did any. Usually this approach should work.

I am recently working on birdcall activation detection task based on your model, but by replacing the speech label with birdcall label in teacher-student approach on Audioset balanced subset, the new student model seemed to learn nothing from t1 and performed poorly on bird audio file. Could you give some advice on how to train a proper model ?

Oh yeah, that's an interesting task! So my current teacher model is pretty bad in comparison to some other models on Audioset.
But you need to recall that like 40% of all labels in audioset are speech and also that the labeling procedure of this ``Speech'' label is rather precise ( because I mean its speech, nothing complicated ).
Just recall that my model has seen at least ~ 2 Million samples containing speech.
On the other hand, you task using birds is much more complicated. Further the labels in audioset might not be "optimal" to say the least, since many labels describe birds. Also "bird" related classes are pretty rare compared to "speech". Even though they might achieve a high mAP on the dataset, it does not mean that the model can effectively predict these classes.

I recommend you to actually fine-tune your model first on a bird-specific dataset, then re-estimate on the balanced dataset and then train a student. It might be worth a try!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How was the ground truth in the article be set? How to get it? #16

How was the ground truth in the article be set? How to get it? #16

Ko-vey commented May 17, 2022

RicherMans commented May 17, 2022

RicherMans commented May 17, 2022

Ko-vey commented May 17, 2022

RicherMans commented May 17, 2022

How was the ground truth in the article be set? How to get it? #16

How was the ground truth in the article be set? How to get it? #16

Comments

Ko-vey commented May 17, 2022

RicherMans commented May 17, 2022

RicherMans commented May 17, 2022

Ko-vey commented May 17, 2022

RicherMans commented May 17, 2022