Personalizing Frozen Vision-Language Representations

This reporsitory contains the annotations and scripts for the algorithm described in the paper

"This is my unicorn, Fluffy": Personalizing frozen vision-language representations , Niv Cohen, Rinon Gal, Eli A. Meirom, Gal Chechik, Yuval Atzmon, ECCV 2022 (Oral)

Setup

Make sure you have miniconda or anaconda installed
Install and activate the project conda environment

conda env create --file ./environment.yaml
conda activate palavra
pip install git+https://github.com/openai/CLIP.git

Train f_theta

Download and prepare the data folders and captioning files as explained in the PerVL Benchmark project:
https://github.com/NVlabs/PerVLBenchmark
We assume the training set were created under: ../PerVLBenchmark/data
Call the get_f_theta.py script to create an f_theta model:

WANDB_MODE=offline python get_f_theta.py --object_dict_path="../PerVLBenchmark/data/inversion_model_train_data/txt_for_training/commom_obj_dict.npy" --visual_features_path="../PerVLBenchmark/data/inversion_model_train_data/visual_features/visual_features_dict_center_crop_300_224.npz" --text_aug_map_path="../PerVLBenchmark/data/inversion_model_train_data/open_images/open_images_to_mscoco_map.npz" --text_obj_path="../PerVLBenchmark/data/inversion_model_train_data/open_images/open_images_obj_names.npz" --batch_size=256 --coeff_gt_object_loss=512 --data_path=data/deep_fashion/experiment_sets/test_2022_02_07-16_42_59/ --deep_set_d_dim=4096 --dropout=0.25 --epochs=300 --is_augment_object=True --is_gt_object_loss=True --is_image_input_test=True --is_image_input_train=True --is_prompt_multi=True --is_save_models=True --is_text_visual_map=True  --lr=0.0001  --no_of_new_tokens=1  --pooling_type=mean  --sentence_wise_split=True --set_size=5

The model is saved under ../sandbox/checkpoints/ as a default.

Retrival on PerVL Benchmarks

Download and prepare the data folders and captioning files as explained in the PerVL Benchmark project:
https://github.com/NVlabs/PerVLBenchmark
We assume the evaluation set were created under: ../PerVLBenchmark/data
Call the caption_retrival_eval.py script using the trained f_theta model, and the desired dataset as below. Note that the model can be found under ../sandbox/checkpoints/, and is identified using a "%YYYY_%mmm_%dd-%HH_%MM_%SS" string (e.g. 2022_07_18-04_00_07) describing its creation time.

Youtube_vos retrieval example

WANDB_MODE=offline python caption_retrival_eval.py --data_path=../PerVLBenchmark/data/youtube_vos/retrival_sets/test --set_captions_path=../PerVLBenchmark/annotations/ytvos/cleaned_captions_ytvos_test.csv --captions_path=../PerVLBenchmark/annotations/ytvos/ytvos_joint_captions.csv --batch_size=256 --coeff_gt_object_loss=512 --is_constant_caption_abl=False --is_optimize_token=True --is_short_captions=True --is_token_as_suffix=True --is_train_loader_no_reps=False --is_class_name_folders=True --latent_ep=30 --no_fsl=False  --random_seed=5  --set_size=5  --token_optimize_mode=1 --model_name  2022_07_18-04_00_07

Note that the flag is_class_name_folders is set to True, as the ytvos dataset coarse grained class names are found in their folder name.

DeepFashion2 retrieval example

WANDB_MODE=offline python caption_retrival_eval.py --data_path=../PerVLBenchmark/data/deep_fashion2/personalized_test --set_captions_path=../PerVLBenchmark/annotations/deep_fashion/test_captions.csv --captions_path=../PerVLBenchmark/annotations/deep_fashion/shortened_deepfashion2_captions.csv --coarse_gained_class_names_path=../PerVLBenchmark/annotations/deep_fashion/train_coarse_grained_names.csv --batch_size=256 --coeff_gt_object_loss=512 --is_constant_caption_abl=False --is_optimize_token=True --is_short_captions=True --is_token_as_suffix=True --is_train_loader_no_reps=False --is_class_name_folders=False --latent_ep=30 --no_fsl=False  --random_seed=5 --set_size=5  --token_optimize_mode=1 --model_name  2022_07_18-04_00_07

Note that you can also use --captions_path=../PerVLBenchmark/annotations/deep_fashion/detailed_deepfashion2_captions.csv for the detailed caption evalution.

Segmentation on PerVL Benchmarks

For evaluating the semantic segmentation results we used the code provided by the authors of Zabari et al. "Semantic Segmentation In-the-Wild Without Seeing Any Segmentation Examples". The authors haven't made their code publicly available yet. So we cannot release yet our corresponding code. We will release our code, once Zabari et al. code becomes publicly available.

Cite the paper

If you use the contents of this project, please cite our paper.

@inproceedings{eccv2022_palavra_cohen,
 author = {Cohen, Niv and Gal, Rinon and Meirom, Eli A. and Chechik, Gal and Atzmon, Yuval},
 booktitle = {European Conference on Computer Vision (ECCV) },
 title = {"This is my unicorn, Fluffy": Personalizing frozen vision-language representations},
 year = {2022}
}

For business inquiries, please contact researchinquiries@nvidia.com
For press and other inquiries, please contact Hector Marinez at hmarinez@nvidia.com

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
LICENSE_CLIP		LICENSE_CLIP
LICENSE_COSMO		LICENSE_COSMO
LICENSE_DEEPSET		LICENSE_DEEPSET
README.md		README.md
caption_retrival_eval.py		caption_retrival_eval.py
environment.yaml		environment.yaml
fsl_eval.py		fsl_eval.py
get_f_theta.py		get_f_theta.py
palavra_main_panels.png		palavra_main_panels.png

License

Licenses found

NVlabs/PALAVRA

Folders and files

Latest commit

History

Repository files navigation

Personalizing Frozen Vision-Language Representations

Setup

Train f_theta

Retrival on PerVL Benchmarks

Youtube_vos retrieval example

DeepFashion2 retrieval example

Segmentation on PerVL Benchmarks

Cite the paper

About

Resources

License

Licenses found

Stars

Watchers

Forks

Languages