COYO-ALIGN

COYO-ALIGN is an implementation of ALIGN by Kakao Brain that achieves similar performance to Google's ALIGN using the publicly available COYO-700M dataset, instead of ALIGN 1.8B dataset which has not been released to the public. When trained on the same dataset CC3M, COYO-ALIGN matches ALIGN performance.

	# of parameters	Dataset	ImageNet	Flickr30k		MsCOCO
			KNN	I2T R@1	T2I R@1	I2T R@1	T2I R@1
ALIGN-L2-Large(Google)	307M+117M	ALIGN 1.8B	76.4	88.6	75.7	58.6	45.6
ALIGN-B7-Base(Google)	66M+110M	ALIGN 1.8B	69.3	-	-	55.4	41.7
COYO-ALIGN-B7-Base(KakaoBrain)	66M+110M	COYO-700M	68.6	88.1	73.2	61.2	43.1

ALIGN-B3-Mini(Google)	12M+11.3M	CC-3M	48.9	-	-	22.1	17.3
COYO-ALIGN-B3-Mini(KakaoBrain)	12M+11.3M	CC-3M	46.2	42.8	35.0	21.2	17.0

Note that only 86% of CC3M data is available for download as of Sept. 2022.

Installation

pip3 install -r requirements.txt

Dataset

Datasets used

CC3M
COYO-700M
ImageNet for ImageNet KNN evaluation
MS COCO Captions for I2T & T2I retrieval evaluation
Flickr30k for I2T & T2I retrieval evaluation

CC-3M (174.6GiB)

Follow https://github.com/rom1504/img2dataset/blob/main/dataset_examples/cc3m.md to download CC3M. To be more specific:

Go to https://ai.google.com/research/ConceptualCaptions/download
Download "Training split" as Train_GCC-training.tsv

sed -i '1s/^/caption\turl\n/' Train_GCC-training.tsv

pip install img2dataset tensorflow tensorflow_io
# apt-get update && apt-get install -y libgl1
img2dataset --url_list Train_GCC-training.tsv --input_format "tsv"\
           --url_col "url" --caption_col "caption" --output_format tfrecord\
           --output_folder cc3m --processes_count 32 --thread_count 64\
           --image_size 346 --resize_mode keep_ratio

COYO-700M

Follow https://github.com/kakaobrain/coyo-dataset#getting-started to download images and save them into .tfrecord files

ImageNet (for evaluation)

Follow https://www.tensorflow.org/datasets/catalog/imagenet2012

MS COCO Captions & Flickr30k (for evaluation)

Download the followings:
1. Validation set split configs: https://cs.stanford.edu/people/karpathy/deepimagesent/caption_datasets.zip
2. MsCoco validation dataset: http://images.cocodataset.org/zips/val2014.zip
3. Flickr30k dataset: https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset
Unzip all
Run python evaluate/create_tfrecords.py

Upload to GCS

TPU can only access filesystem on Google Cloud Storage. If you want to use TPU to train, follow the below steps to copy files to GCS.

curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-402.0.0-linux-x86_64.tar.gz
tar -xf google-cloud-cli-402.0.0-linux-x86_64.tar.gz
./google-cloud-sdk/install.sh
gcloud init
gsutil cp -R <tfrecord directory> gs://<your bucket>

Train

CC-3M + EfficientNet-B3 + BERT-mini setting

python3 main.py --flagfile conf/cc3m.flags \
                --outdir gs://<outdir> --dataset_dir gs://<directory containing tfrecord files> \
                --tpu <tpu name>

COYO-700M + EfficientNet-B7 + BERT-base setting

python3 main.py --flagfile conf/coyo700.flags --flagfile conf/b7-base.flags \
                --outdir gs://<outdir> --dataset_dir gs://<directory containing tfrecord files> \
                --tpu <tpu name>

Evaluate

Download model weights

You can download the model weight from https://huggingface.co/kakaobrain/coyo-align-b7-base as follows:

git lfs install
git clone https://huggingface.co/kakaobrain/coyo-align-b7-base

On TPU

Note that TPU can only access files on Google Cloud Storage. You have to upload the model weight files to a gs:// location if you want to run evaluation on TPU.

cd evaluate
python eval_all.py --flagfile ../conf/b7-base.flags # if using b7 model
                   --checkpoint gs://<checkpoint path>
                   --workdir gs://<work dir> --tpu <tpu name>
                   --imagenet_dataset_dir gs://<directory containing imagenet2012>
                   --flickr_dataset_path gs://<path to flickr30k.tfrecord>
                   --coco_dataset_path gs://<path to coco.tfrecord>

If you want only to run ImageNet KNN evaluation, remove --flickr_dataset_path and --coco_dataset_path arguments

On CPU

ImageNet KNN requires inferencing all 1.25M images in ImageNet. It takes very long time on CPU. It's recommended to use TPU or GPU for ImageNet KNN. Flickr30k and Coco evaluation only evaluates on 5k and 1k images. So, it is doable on CPU. Remember to reduce --batch_size to fit within your machine's CPU memory. It takes around 30~60 minutes to finish.

cd evaluate
python eval_all.py --flagfile ../conf/b7-base.flags
                   --checkpoint coyo-align-b7-base/model-weights
                   --workdir ./ --batch_size 32
                   --flickr_dataset_path gs://<path to flickr30k.tfrecord>
                   --coco_dataset_path gs://<path to coco.tfrecord>

Implementation note

Our implementation follows pretty much everything in the paper down to every detail. The only notable difference is use of virtual_batch_size. In our experiement on CC3M, the performance on V3-1024(as used by paper) was better than V3-128. We presumed it was because of the local batch size difference between V3-128 and V3-1024 since the same global batch size was used and the batch norm stats were not reduced across multi-nodes. virtual_batch_size was used to simulate the local batch size of V3-1024 on V3-128.(we preferred V3-128 because of its performance-to-price efficiency, especially on B3-Mini experiements in which the model size and batch size was small)

Citation

@misc{kakaobrain2022coyo-align,
  title         = {COYO-ALIGN},
  author        = {Yoon, Boogeo and Lee, Youhan and Baek, Woonhyuk},
  year          = {2022},
  howpublished  = {\url{https://github.com/kakaobrain/coyo-align}},
}

People

Boogeon Yoon (@bgyoon)
Youhan Lee (@qkqkfldis1)
Woonhyuk Baek (@wbaek)

Contact

eric.yoon@kakaobrain.com

License

The source codes are licensed under Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
conf		conf
datasets		datasets
evaluate		evaluate
models		models
LICENSE		LICENSE
README.md		README.md
align.py		align.py
main.py		main.py
requirements.txt		requirements.txt
tensorboard_callback.py		tensorboard_callback.py
warmup_lr.py		warmup_lr.py

License

kakaobrain/coyo-align

Folders and files

Latest commit

History

Repository files navigation

COYO-ALIGN

Installation

Dataset

CC-3M (174.6GiB)

COYO-700M

ImageNet (for evaluation)

MS COCO Captions & Flickr30k (for evaluation)

Upload to GCS

Train

CC-3M + EfficientNet-B3 + BERT-mini setting

COYO-700M + EfficientNet-B7 + BERT-base setting

Evaluate

Download model weights

On TPU

On CPU

Implementation note

Citation

People

Contact

License

About

Resources

License

Stars

Watchers

Forks

Languages