Exploring Heterogeneous Clues for Weakly Supervised Audio-Visual Video Parsing

Code for CVPR 2021 paper Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing

The Audio-Visual Video Parsing task

We aim at identifying the audible and visible events and their temporal location in videos. Note that the visual and audio events might be asynchronous.

Prepare data

Please refer to https://github.com/YapengTian/AVVP-ECCV20 for downloading the LLP Dataset and the preprocessed audio and visual features. Put the downloaded r2plus1d_18, res152, vggish features into the feats folder.

Training pipeline

The training includes three stages.

Train a base model

We first train a base model using MIL and our proposed contrastive learning.

cd step1_train_base_model
python main_avvp.py --mode train --audio_dir ../feats/vggish/ --video_dir ../feats/res152/ --st_dir ../feats/r2plus1d_18

Generate modality-aware labels

We then freeze the trained model and evaluate each video by swapping its audio and visual tracks with other unrelated videos.

cd step2_find_exchange
python main_avvp.py --mode estimate_labels --audio_dir ../feats/vggish/ --video_dir ../feats/res152/ --st_dir ../feats/r2plus1d_18 --model_save_dir ../step1_train_base_model/models/

Re-train using modality-aware labels

We then re-train the model from scratch using modality-aware labels.

cd step3_retrain
python main_avvp.py --mode retrain --audio_dir ../feats/vggish/ --video_dir ../feats/res152/ --st_dir ../feats/r2plus1d_18

Citation

Please cite the following paper in your publications if it helps your research:

@inproceedings{wu2021explore,
    title = {Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing},
    author = {Wu, Yu and Yang, Yi},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2021}
    
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
feats		feats
step1_train_base_model		step1_train_base_model
step2_find_exchange		step2_find_exchange
step3_retrain		step3_retrain
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
task.png		task.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feats

feats

step1_train_base_model

step1_train_base_model

step2_find_exchange

step2_find_exchange

step3_retrain

step3_retrain

.DS_Store

.DS_Store

.gitignore

.gitignore

README.md

README.md

task.png

task.png

Repository files navigation

Exploring Heterogeneous Clues for Weakly Supervised Audio-Visual Video Parsing

The Audio-Visual Video Parsing task

Prepare data

Training pipeline

Train a base model

Generate modality-aware labels

Re-train using modality-aware labels

Citation

About

Releases

Packages

Languages

Yu-Wu/Modaily-Aware-Audio-Visual-Video-Parsing

Folders and files

Latest commit

History

Repository files navigation

Exploring Heterogeneous Clues for Weakly Supervised Audio-Visual Video Parsing

The Audio-Visual Video Parsing task

Prepare data

Training pipeline

Train a base model

Generate modality-aware labels

Re-train using modality-aware labels

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages