Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About IEMOCAP sentence-level audio features #3

Open
Luyizhe opened this issue Dec 7, 2021 · 3 comments
Open

About IEMOCAP sentence-level audio features #3

Luyizhe opened this issue Dec 7, 2021 · 3 comments

Comments

@Luyizhe
Copy link

Luyizhe commented Dec 7, 2021

Hello,
Can you share the way you extract audio features in the work "Multi-level Multiple Attentions for Contextual Multimodal Sentiment Analysis"? I have no idea that how to extract 100 dimensions sentence-level audio features.
Thank you !

@Luyizhe
Copy link
Author

Luyizhe commented Jan 6, 2022

Hello,
I want to try data augmentation, but I don’t have consistent features. By reading your paper "Conversational Memory Network
for Emotion Recognition in Dyadic Dialogue Videos", I find your way to transformer 6373 dimensions to 100 dimensions by using FC layer. But I can't get appropriate matrix weights. Can you share the weights?
Thank you!

@soujanyaporia
Copy link
Contributor

We used openSMILE and then fed that to an FC network with 100-dim output. This FC network can be trained using your training dataset's labels. Alternatively you can use other audio features as shown here: https://github.com/soujanyaporia/MUStARD

@Penglikai
Copy link

We used openSMILE and then fed that to an FC network with 100-dim output. This FC network can be trained using your training dataset's labels. Alternatively you can use other audio features as shown here: https://github.com/soujanyaporia/MUStARD

Hi, thanks for your clarification. Could you please share the scripts of dimension reduction process? I am trying to replicate the feature extraction but having trouble with the FC network settings for dimension reduction.

BTW, may I know why the librosa feature are with different size for each audio utterance? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants