Leveraging Vision-Language Fusion in Ad Recommendation System

Capstone Project for AdView

A project by Yixuan Xu, Sumin Lee, Jianzhong (Ken) Shi, Yutong Wang, Jiarui Zhang

Special thanks to the AdView team for their continuous support!

First Time Use

Please ensure that you have the following dependencies.

pytorch 1.13.0
numpy 1.23.3
sklearn 1.1.3
seaborn 0.12.1
pandas 1.5.1

Model Evaluation

To use the best model, please download the checkpoint from here and place it on the same directory as eval_pipeline.py. Alternatively, you can evaluate any other experiments that you have trained on. See note 1 below on how to do so.
Run eval_pipeline.py

Note 1: The arguments provided in eval_pipeline.py are by default the setting for the best model checkpoint. However, feel free to change it. For example, if you performed any training, you may wish to update the hyperparameters do match the training hyperparameter settings.

Model Training

For the final model training setting, please use the default settings. If you would like to change the hyperparameter settings, you may do so as well. See note 2 below on how to do so.
Runtrain_pipeline.py

Note 2: The arguments provided in train_pipeline.py are by default the setting for the best model checkpoint. However, feel free to change it. For example, you may wish to set the output_dim to 5 to see how the model performs for classification performance on 5 classes.

Processing New Data: Generating a Caption for a New Advertisement Image

Download the following checkpoints here and here and place them on the same directory as data_processing.py
data_processing.py, run img_process function. This function requires four inputs. input_image_path is the path to ad image, output_image_path is the path to processed ad image.resnet_weights_path and classifier_weights_path are the paths to saved pretrained models weights. img_process function will read the ad image from input_image_path and write a processed image to output_image_path.
Download the GenerativeImage2Text repo from Github. Follow their README file to perform inference on a single image. Here's an example of the command line:

AZFUSE_TSV_USE_FUSE=1 python3 -m generativeimage2text.inference -p "{'type': 'test_git_inference_single_image', 
      'image_path': './GenerativeImage2Text/aux_data/images', \
      'model_name': 'GIT_LARGE_TEXTCAPS', \
      'prefix': '', \
      'result_file': './GenerativeImage2Text'\
}"

The model name should be GIT_LARGE_TEXTCAPS, the image should be placed inside aux_data/images under the GenerativeImage2Text folder. Then a txt file with the generated capstion will be generated under the 'result_file' folder.

Processing New Data: Training Classifier in Class-aware Image Segmentation Tool from Scratch

classifier.py, run train() function. This function requires five inputs. annotation_file is the path to a csv file which contains the ground truth label for each advertisement image. img_dir is the path to a folder which contains all advertisement images (incluidng train, val, test). Other inputs are the hyperparameter settings. Feel free to change it.

Procsesing New Data: Performing Data Augmentation and Saving Augmented Data

data_augmentation.py and run augmentation_img() function. This function requires a input path to read the image and a output path to write the augmented image. The function will apply a blur filter and increase brightness to the image.

Processing New Data: Generating BERT Caption Embedding

captions_to_BERT.py: Convert captions to BERT embeddings (of dimension 768);
dim_reduct.py: Add and train a PCA layer for the model to reduce the output dimension to 128;
reduced_BERT.py: Convert captions to reduced BERT embeddings (of dimension 128), using the model trained by dim_reduct.py.

Processing New Data: Generating CLIP Image Embedding

img_emb.py: Convert ads images to CLIP embeddings (of dimension 512);
finetune_CLIP.py: Finetune clip models using caption-ads pairs in the given dataset and save the model.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data		data
data_processing		data_processing
README.md		README.md
capstone_test_pipeline.ipynb		capstone_test_pipeline.ipynb
data_pipeline.py		data_pipeline.py
eval_pipeline.py		eval_pipeline.py
models.py		models.py
train_pipeline.py		train_pipeline.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

data_processing

data_processing

README.md

README.md

capstone_test_pipeline.ipynb

capstone_test_pipeline.ipynb

data_pipeline.py

data_pipeline.py

eval_pipeline.py

eval_pipeline.py

models.py

models.py

train_pipeline.py

train_pipeline.py

utils.py

utils.py

Repository files navigation

Leveraging Vision-Language Fusion in Ad Recommendation System

Capstone Project for AdView

About

Releases

Packages

Contributors 4

Languages

richardxuyixuan/capstone-project

Folders and files

Latest commit

History

Repository files navigation

Leveraging Vision-Language Fusion in Ad Recommendation System

Capstone Project for AdView

About

Resources

Stars

Watchers

Forks

Languages