VioletV2

Techincal report coming soon!

Spoilers: The model is similar to the original paper but replaces the cumbersome detection network with a CLIP vision encoder (which can be trained end-to-end without relying on an external model), and utilizes adapters on the decoder side

Data

Coco Images HDF5 file: Download

Annotations: Download

Environment setup

Clone the repository and create the Violet conda environmnet

conda env create -f violet.yml

make logs and saved_models directories

mkdir logs
mkdir saved_models

Train the model (refactored code)

simpler and more friendly impelementation (You can ignore the data and evaluation folders when using this)

python train_refactored.py --batch_size 60 --head 12 --tau 0.3 --images_path coco_images.h5  --annotation_folder annotations --lr 1e-4 --random_seed 42 --log_file logs/log --decoder_layer 12 --optimizer_type adamw  --gradient_accumulation_steps 1  --exp_name violet

Train the model (legacy code)

based on the code used in Meshed transformer and VisualGPT, edited to use python 3 instead of the original 2.7

python train_legacy.py --batch_size 40 --head 12 --tau 0.3 --features_path ./coco_images.h5  --annotation_folder annotations --lr 1e-4 --random_seed 42 --log_file logs/log --decoder_layer 12 --optimizer_type adamw  --gradient_accumulation_steps 1  --exp_name violet

Acknowledgement

This code used resources from Meshed Memory Transformer, Transformers and VisualGPT

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data		data
evaluation		evaluation
models		models
utils		utils
LICENSE		LICENSE
README.md		README.md
dataset_refactored.py		dataset_refactored.py
light_normalizer.py		light_normalizer.py
requirements_pip.txt		requirements_pip.txt
train_legacy.py		train_legacy.py
train_refactored.py		train_refactored.py
violet.yml		violet.yml

License

UBC-NLP/VioletV2

Folders and files

Latest commit

History

Repository files navigation

VioletV2

Techincal report coming soon!

Data

Environment setup

Train the model (refactored code)

simpler and more friendly impelementation (You can ignore the data and evaluation folders when using this)

Train the model (legacy code)

based on the code used in Meshed transformer and VisualGPT, edited to use python 3 instead of the original 2.7

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Languages