Hierarcical Attention for Multimodal Fusion

A novel Hierarchical Attention model to classify 6 classes of emotions from visual, verbal, and vocal features of a person’s speech in CMU-MOSEI dataset.

File Structure

Let's look at the file structure of the entire project

🗄️ Data [Stores all the paths to the separete datasets and scripts to load & preprocess them]

-🗃️ init.py [refers to all the environment variables]

-🗃️ DataLoaderCreator.py [Script to align and preprocess the data, convert that data into dataloaders]

🗄️ Constants

-🗃️ init.py [need this to let interpreter know that this is a package]

-🗃️ paths.py [paths and variables are stored here]

🗄️ Dataset

-🗃️ init.py [need this to let interpreter know that this is a package]

🗄️ AlignedData [Need to run the Data/DataLoaderCreator.py script in order to align the data first]

🗄️ 🗄️ high_level [all the high-level features]

COVAREP.csd [acoustics]
FACET 4.2.csd [visuals]
glove_vectors.csd [textual]
OpenFace_2.0.csd [visuals]
OpenSMILE.csd [visuals]

🗄️ 🗄️ labels

All Labels.csd

🗄️ CMU-MultimodalSDK [Provides the functions to download, format, align the data; split it into train-valid-test splits]

Download this from https://github.com/A2Zadeh/CMU-MultimodalSDK/

🗄️ HighLevelData [Unaligned data with high level features]

🗄️ 🗄️ high_level [all the high-level features]

COVAREP.csd [acoustics]
FACET 4.2.csd [visuals]
glove_vectors.csd [textual]
OpenFace_2.0.csd [visuals]
OpenSMILE.csd [visuals]

🗄️ 🗄️ labels

All Labels.csd

🗄️ RawData [We only needed one modality's raw data]

Can be found at http://immortal.multicomp.cs.cmu.edu/CMU-MOSEI/language/

CMU_MOSEI_TimestampedWords.csd [textual]

🗄️ Models [all models have both .py and .ipynb versions]

init.py [need this to let interpreter know that this is a package]
AcousticsEmbdder.py/.ipynb [Extracts embeddings from COVAREP high level features with the DeltaSelfAttention module]
TextEmbdder.py/.ipynb [Extracts embeddings from raw sentence level words with the DeltaSelfAttention module]
VisualsEmbdder.py/.ipynb [Extracts embeddings from FaceT4.2 high level words with the DeltaSelfAttention module]
DeltaSelfAttention.py/ipynb [Temporally and contextually attends to features within a modality]
DCCA.py/.ipynb [Gathers all modality's extracted self-attended embeddings and run a deep canonical correlation analysis; CrossAttention module for the canonically fused embeddings also included in this script; Final results can be derived after the CrossAttention]
TextClassifierXLNet.py/XLNet_CMU_MOSEI_Text.ipynb/TextEmbedderXLNet.ipynb/TextClassifierXLNetPrototype.ipynb [Misc models used for prototyping and experimenting] 🗄️ 🗄️ model_constants -🗃️ init.py [need this to let interpreter know that this is a package]

-🗃️ paths.py [paths and variables for the models are stored here] 🗄️ 🗄️ saved_models [models you run and save the weights of will be found here]

Pre-requisite Libraries

PyTorch 1.7.0
Scikit-learn 0.23
Numpy 1.19
Huggingface Transformers 3.5.0
Ray Tune 1.0.1

How to make it work

Download the dataset as per the instructions from https://github.com/A2Zadeh/CMU-MultimodalSDK/. See the Dataset folder contents for more details.
Change Data/constants and Models/model_constants path variables as per your local directory setup. Check out the files for more instructions and information on how they are currently setup.
For the modality alignment, run Data/DataLoaderCreator.py script. Always run all the scripts from the root folder as the relative paths are set like that.
Run Models/TextEmbedder.py, Models/AcousticEmbedder.py, and Models/VisualEmbedder.py to save the embeddings.
Run Models/DCCA (running this will start running the models in step 4 also, it's a long process, so to fragment into memory-manageable pieces, we can run it one by one.) Note: If you are running step 4 first, make sure you are saving the hidden representations/Embeddings somewhere first. And Models/DCCA should load those instead of running the models again.

Each section in the scripts or notebooks are commented with what they do and how to run those.

Model weights are not included in the repo because of the size constraints.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Data		Data
Dataset		Dataset
Models		Models
README.md		README.md
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data

Data

Dataset

Dataset

Models

Models

README.md

README.md

init.py

init.py

Repository files navigation

Hierarcical Attention for Multimodal Fusion

File Structure

🗄️ Data [Stores all the paths to the separete datasets and scripts to load & preprocess them]

🗄️ Constants

🗄️ Dataset

🗄️ AlignedData [Need to run the Data/DataLoaderCreator.py script in order to align the data first]

🗄️ CMU-MultimodalSDK [Provides the functions to download, format, align the data; split it into train-valid-test splits]

🗄️ HighLevelData [Unaligned data with high level features]

🗄️ RawData [We only needed one modality's raw data]

🗄️ Models [all models have both .py and .ipynb versions]

Pre-requisite Libraries

How to make it work

About

Releases

Packages

Languages

Astuary/MultimodalFusion

Folders and files

Latest commit

History

Repository files navigation

Hierarcical Attention for Multimodal Fusion

File Structure

🗄️ Data [Stores all the paths to the separete datasets and scripts to load & preprocess them]

🗄️ Constants

🗄️ Dataset

🗄️ AlignedData [Need to run the Data/DataLoaderCreator.py script in order to align the data first]

🗄️ CMU-MultimodalSDK [Provides the functions to download, format, align the data; split it into train-valid-test splits]

🗄️ HighLevelData [Unaligned data with high level features]

🗄️ RawData [We only needed one modality's raw data]

🗄️ Models [all models have both .py and .ipynb versions]

Pre-requisite Libraries

How to make it work

About

Resources

Stars

Watchers

Forks

Languages