Video Noise Contrastive Estimation (VINCE)

This is a repository containing code used to implement the models in the paper Watching the World Go By: Representation Learning from Unlabeled Videos (https://arxiv.org/abs/2003.07990).

Environment Setup

We recommend using Anaconda to manage your environment setup and run our code. The following commands will create an environment similar to ours with minimal requirements.

Conda

conda create -n video-env python=3.6.8
conda deactivate
conda env update -n video-env -f env.yml
conda activate video-env
pip install git+https://github.com/danielgordon10/dg_util.git -U

Virtualenv

If you instead prefer virtualenv or similar, we have also provided a requirements.txt.

virtualenv --python=python3.6 video-env
source video-env/bin/activate
pip install -r requirements.txt

Downlaod Random Related Video Views (R2V2)

Due to budgetary constraints, I can no longer directly host the dataset directly, however I have made available a script to recreate the dataset. Note however that many of the original videos have since been deleted from youtube, so their data cannot be recreated. If you are interested in hosting the dataset for me, please contact me.

Recreate the dataset

Ensure you have set up the conda environment and installed dg_util.git as noted in Conda
Follow instructions to create cookies.txt
Run python download_scripts/recreate_r2v2_dataset.py

Notes

Original Dataset:

	Size (GB)	Number of Files	Number of Images	Number of Folders	Number of Source Videos
Train	110	2,788,424	2,784,328	4096	696,082
Val	8.8	226,620	222,524	4096	55,631

Some folders have many more images than others. This is expected.
The video and frame ids are also provided in datasets/info_files/r2v2_ids_train.txt and datasets/info_files/r2v2_ids_val.txt

Downloading your own set of YouTube videos

If you would like to download a different set of YouTube videos, you may still find our code helpful. Here is a basic workflow for downloading many YouTube videos.

Follow instructions to create cookies.txt
Create a list of many YouTube URLs to download.
1. One option would be to use youtube_scrape/search_youtube_for_urls.py
2. Another would be YouTube-8m URLs (https://github.com/danielgordon10/youtube8m-data)
Run python run_cache_video_dataset.py --title cache --description caching --num-workers 100 after appropriately formatting the files.
- Note - You can often use more workers than your CPU has threads because YouTube downloading tends to be the bottleneck.
youtube_scrape/download_kinetics.py is a convenient file for downloading Kinetics videos.

Create cookies.txt

Follow instructions at https://apple.stackexchange.com/a/349759
Go to any youtube video: https://www.youtube.com/watch?v=AKQE9RyOIMY
Click the extension icon and save the data into youtube_scrape/cookies.txt.

Training

Train VINCE

Download R2V2 training data or create your own dataset to train on.
Read over the arguments list in arg_parser.py.
Train the model. We have provided an example train script as well as a debug script to check everything is working. Edit the paths in the file to point to your data/output locations.

Train baselines

The official MoCo baseline is available at https://github.com/facebookresearch/moco, but for our work, we wrote our own version.
We have provided an example train script to train this model.
We additionally include MoCoV2 baseline scripts for ResNet50 at vince/train_moco_v2.sh.
We additionally include the Jigsaw method from PIRL and an accompanying script vince/train_vince_jigsaw.sh. Pretrained weights and results are currently not provided.

Train End Task

We include various end tasks and an interface for easily adding more. Training scripts for each task are available at:
New end tasks can be added by creating a new solver which inherits from EndTaskBaseSolver and an accompanying dataset which inherits from BaseDataset.

Evaluation

While training each end task, evaluation is done after every epoch on a val set.
If more evaluation is needed, it can be added by implementing run_eval for that solver. For an example, see solvers/end_task_tracking_solver.py and end_tasks/eval_tracking.sh.

Download Pretrained Weights

Pretrained weights are available for VINCE as well as all baselines mentioned in the paper. We provide the pretrained weights for the backbone only, not for any end task.

ResNet18

To download the weights, from the root directory, run sh download_scripts/download_pretrained_weights_resnet18.sh Alternatively, download them directly from https://drive.google.com/uc?id=1L2SZvsvpxe-A1gCN9Nxg9LwB_d604aQf

ResNet50

These models were trained using the hyperparameters in https://arxiv.org/abs/2003.04297 except for batch size which was 896 (starting loss was scaled proportionally to 0.105). To download the weights, from the root directory, run sh download_scripts/download_pretrained_weights_resnet50.sh Alternatively, download them directly from https://drive.google.com/uc?id=11TfKfZLLx2FYCATjkll5nUIOxSgSBWGi

Benchmark Results

The results you achieve should somewhat match the table below, though different learning schedules and other factors may slightly change performance.

Method Name (In Paper)	Dir Name	Backbone	ImageNet	Sun Scenes	Kinetics 400	OTB 2015 Precision	OTB 2015 Success
Sup-IN	N/A	ResNet18	0.696	0.491	0.207	0.557	0.396
MoCo-IN	moco-in	ResNet18	0.447	0.487	0.336	0.583	0.429
MoCo-G	moco-g	ResNet18	0.393	0.444	0.313	0.511	0.413
MoCo-R2V2	moco-r2v2	ResNet18	0.358	0.450	0.318	0.555	0.403
VINCE	vince-r2v2-multi-frame-multi-pair	ResNet18	0.400	0.495	0.362	0.629	0.465
Sup-IN	N/A	ResNet50	0.762	0.593	0.305	0.458	0.320
MoCo-V2-IN	moco-v2-in	ResNet50	0.652	0.608	0.459	0.300	0.260
MoCo-R2V2	moco-v2-r2v2	ResNet50	0.536	0.581	0.456	0.386	0.299
VINCE	vince-r2v2-multi-frame-multi-pair	ResNet50	0.544	0.611	0.491	0.402	0.300

Citation

@misc{gordon2020watching,
    title={Watching the World Go By: Representation Learning from Unlabeled Videos},
    author={Gordon, Daniel and Ehsani, Kiana and Fox, Dieter and Farhadi, Ali},
    year={2020},
    eprint={2003.07990},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
datasets		datasets
download_scripts		download_scripts
end_tasks		end_tasks
models		models
siamfc_pytorch		siamfc_pytorch
solvers		solvers
utils		utils
vince		vince
visualizations		visualizations
youtube_scrape		youtube_scrape
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
__init__.py		__init__.py
arg_parser.py		arg_parser.py
constants.py		constants.py
env.yml		env.yml
requirements.txt		requirements.txt
run_cache_video_dataset.py		run_cache_video_dataset.py
run_download_kinetics.py		run_download_kinetics.py
run_end_task_eval.py		run_end_task_eval.py
solver_runner.py		solver_runner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Video Noise Contrastive Estimation (VINCE)

Environment Setup

Conda

Virtualenv

Downlaod Random Related Video Views (R2V2)

Recreate the dataset

Notes

Downloading your own set of YouTube videos

Create cookies.txt

Training

Train VINCE

Train baselines

Train End Task

Evaluation

Download Pretrained Weights

ResNet18

ResNet50

Benchmark Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

danielgordon10/vince

Folders and files

Latest commit

History

Repository files navigation

Video Noise Contrastive Estimation (VINCE)

Environment Setup

Conda

Virtualenv

Downlaod Random Related Video Views (R2V2)

Recreate the dataset

Notes

Downloading your own set of YouTube videos

Create cookies.txt

Training

Train VINCE

Train baselines

Train End Task

Evaluation

Download Pretrained Weights

ResNet18

ResNet50

Benchmark Results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages