Model fails to converge on transfer to audio backtesting problem #19

yangma12 · 2023-05-26T13:11:04Z

Dear Yuan and authors,
First of all, thank you for your paper. Recently, I migrated your pre-trained model to the regression prediction task of personality computing. After splicing several fully connected layers after your original model, the result is that the predicted value will only be maintained at a very low level during training. In a small interval, there will be no effective changes. Have you done relevant regression experiments? What are the possible reasons for this problem?
Sorry to bother you with my question and thank you very much for reading my question

yang

YuanGongND · 2023-05-26T13:44:48Z

hi there,

Do you mean you finetune our pretrained model for a regression task?

What do you by this?

After splicing several fully connected layers after your original model

-Yuan

yangma12 · 2023-05-26T13:57:30Z

thank you for your reply！I mainly use this data set for fine-tuning, and separate the audio of this data set（https://chalearnlap.cvc.uab.cat/dataset/24/description/）. Each audio is a 15-second speech audio, and the MLP is stitched after the model to adjust the dimension of the audio data output by the final model to ( batchsize,5), 5 corresponds to the regression value of five personality traits corresponding to an audio.

yangma12 · 2023-05-26T14:05:41Z

In the experiment, I tried to adjust the learning rate and other parameters, tried to remove the mask and mixing in the data preprocessing, set the input_tdim to 1530 to suit my audio length, label_dim to 512, and finally performed regression prediction through the following code : nn.Sequential(
nn.Linear(in_features=512, out_features=256),
nn.ReLU(inplace=True),
nn.Linear(in_features=256, out_features=128),
nn.ReLU(inplace=True),
nn.Linear(in_features=128, out_features=6),
nn. Sigmoid()
)，Forgive me for not being deep enough in deep learning at the moment, I'm not sure where the problem might be.

YuanGongND · 2023-05-27T05:32:42Z

There are a few things:

First, it seems a multi-modal, speech-dominated dataset. So you might want to try an audio-visual model or speech-based model (e.g., Hubert), according to my experience, for pure speech task, pure speech models are better, can you see the Table 5 of SSAST Paper? For audio-visual models, we have CAV-MAE for general audio-visual model, but again, you might need a model focusing on face.
For this

nn.Sequential(
nn.Linear(in_features=512, out_features=256),
nn.ReLU(inplace=True),
nn.Linear(in_features=256, out_features=128),
nn.ReLU(inplace=True),
nn.Linear(in_features=128, out_features=6),
nn. Sigmoid()
)

Is Sigmoid common for regression? Setting "label_dim to 512" (for classification) and then a few dense layers seems to be redundent. You can just change the last MLP layer to a regression head.

ssast/src/models/ast_models.py

Lines 166 to 167 in a1a3eec

    
           self.mlp_head = nn.Sequential(nn.LayerNorm(self.original_embedding_dim), 
        
                                         nn.Linear(self.original_embedding_dim, label_dim))

But I know very little about your task. You need to tune the params by yourself. For some networks, we use a larger learning rate for the mlp layer because it is random initialized while other parameters are pretrained.

I mainly answer questions that are related to what we presented in the paper, and it is hard for me to answer questions regarding new task / usage of the model.

-Yuan

YuanGongND · 2023-05-27T05:33:51Z

Another minor point is that you said there are 5 regression values, but nn.Linear(in_features=128, out_features=6) shows 6.

YuanGongND added the question Further information is requested label May 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model fails to converge on transfer to audio backtesting problem #19

Model fails to converge on transfer to audio backtesting problem #19

yangma12 commented May 26, 2023

YuanGongND commented May 26, 2023

yangma12 commented May 26, 2023

yangma12 commented May 26, 2023

YuanGongND commented May 27, 2023

YuanGongND commented May 27, 2023

Model fails to converge on transfer to audio backtesting problem #19

Model fails to converge on transfer to audio backtesting problem #19

Comments

yangma12 commented May 26, 2023

YuanGongND commented May 26, 2023

yangma12 commented May 26, 2023

yangma12 commented May 26, 2023

YuanGongND commented May 27, 2023

YuanGongND commented May 27, 2023