INO_VOS

The official code for [ACM MM 2022] 'In-N-Out Generative Learning for Dense Unsupervised Video Segmentation'. [arXiv]

We achieve a new state-of-the-art performance for unsupervised learning methods on VOS task, based on ViT and the idea of generative learning.

Environment

We test with:

python==3.7
pytorch==1.7.1
CUDA==10.2

We train on Charades with 4x16GB V100 and Kinetics-400 with 8x16GB V100. The training takes around 12h and 1week, respectively. The codebase is implemented based on DINO, DUL, and VRW.

Dataset Preparation

Training Datasets

We use charades_480p and Kinetics-400 for training.

After downloading datasets, run:

git clone git@github.com:pansanity666/INO_VOS.git
cd INO_VOS
mkdir ./data
ln -s /your/path/Charades_v1_480 ./data
ln -s /your/path/Kinetics_400 ./data

Evaluation Datasets

We benchmark on DAVIS-2017 val and YouTube-VOS 2018 val.

Download DAVIS-2017 from here.

Download YouTube-VOS 2018 (valid_all_frames.zip and valid.zip) from here.

Link them to ./data (similar as previous datasets).

The final structure of data folder should be:

-data
  -Charades_v1_480
    - xxxx.mp4
    - ...
  -Kinetics_400
    - xxxx.mp4
    - ...
  -DAVIS
    - Annotations
    - JPEGImages
    - ...
  -YouTube_VOS
    - valid_all_frames
    - valid

Training

Set the ckpt_output_path in train_charades.sh as you need and then run

# under INO_VOS dir
sh  train_charades.sh

The dataset meta will be cached under ./cached/charades at the first run (it may take few minutes.).

Same for training on Kinetics-400.

Evaluation

Inference

Our checkpoint used in the paper can be downloaded from here.

For the sake of efficiency, we first pre-generate the neighbor masks used during label propagation and cache them on disk.

python ./scripts/pre_calc_maskNeighborhood.py [davis|ytvos]

It may take few minutes, and the neighbor masks will be cached under ./cached/masks by default.

Then, run label propagation via:

sh infer_vos.sh [davis|ytvos] $CKPT_PATH

Two folders will be created under ./results, where vos is the segmentation masks while vis is the blended visualization results.

Evaluation: DAVIS-2017

Please install the official evaluation code and evaluate the inference results:

# under INO_VOS dir
git clone https://github.com/davisvideochallenge/davis2017-evaluation ./davis2017-evaluation
python ./davis2017-evaluation/evaluation_method.py --task semi-supervised --results_path $OUTPUT_VOS --davis_path ./data/DAVIS/

Evaluation: YouTube-VOS 2018

Please use the official CodaLab evaluation server. To create the submission, rename the vos-directory to Annotations and compress it to Annotations.zip for uploading.

Citation

If you find our work useful, please consider citing:

@inproceedings{pan2022n,
  title={In-n-out generative learning for dense unsupervised video segmentation},
  author={Pan, Xiao and Li, Peike and Yang, Zongxin and Zhou, Huiling and Zhou, Chang and Yang, Hongxia and Zhou, Jingren and Yang, Yi},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  pages={1819--1827},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
dataset		dataset
labelprop		labelprop
meta		meta
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
augs.py		augs.py
infer_vos.py		infer_vos.py
infer_vos.sh		infer_vos.sh
losses.py		losses.py
main_VOS.py		main_VOS.py
mask_generator.py		mask_generator.py
palette.py		palette.py
palette_davis.py		palette_davis.py
train_charades.sh		train_charades.sh
train_kinetics400.sh		train_kinetics400.sh
utils.py		utils.py
vision_transformer.py		vision_transformer.py

License

pansanity666/INO_VOS

Folders and files

Latest commit

History

Repository files navigation

INO_VOS

Environment

Dataset Preparation

Training Datasets

Evaluation Datasets

Training

Evaluation

Inference

Evaluation: DAVIS-2017

Evaluation: YouTube-VOS 2018

Citation

About

Resources

License

Stars

Watchers

Forks

Languages