Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
vision-language-pretraining
audio-language-pretraining
audiovisual-language-pretraining
multimodal-representation-learning
-
Updated
Jul 4, 2023 - Python