VPEval

The code for VPEval a novel interpretable/explainable evaluation framework for T2I generation models, based on visual programming, as described in the paper:

Visual Programming for Text-to-Image Generation and Evaluation

Jaemin Cho, Abhay Zala, Mohit Bansal

[Project Page] [Paper] [Code for VPGen]

Change Log

See our change log here.

Code Structure

# Evaluate Source Code
src/

# Data Files
data/

# Data Download and Code Run Scripts
scripts/

Setup Environment

# Create a conda environment
conda create -n vpeval python=3.8
conda activate vpeval

# Install requirements
pip install -r requirements.txt

# Install 2nd requirements (as they must be installed second)
pip install -r requirements_2.txt

Then please follow directions on installing GroundingDINO: https://github.com/IDEA-Research/GroundingDINO

You also need to make sure you have downloaded the GroundingDINO weights and put them in the weights directory. You can do this by running

bash scripts/download_grounding_dino_weights.sh

Then you can download and extract all the model generated images by running

bash scripts/download_images.sh

Running Evaluation

Example outputs of our skill based evaluation.

To run skill based evaluation, please run

bash scripts/evaluate_skill_based.sh

Note: In the paper, we use the first 1000 IDs located in the data/skill_based/random_ids_{skill}.json file, where skill is any of object, count, spatial, etc.

This is already implemented in the code

Example outputs of our open ended evaluation process.

To run open ended evaluation, please run

bash scripts/evaluate_open_ended.sh

Then run the following to get the scores

python src/utils/score_open_ended.py

Explaination Outputs

When running a script, pass the --visualization_savepath argument to choose where to save the explainations. The visual explainations (bounding boxes) will be saved in the ../images/ directory and then a JSON file will also be saved in the root path that includes the text explainations along with a path to the coorsponding images if it is available.

Running End-to-End Inference

Please see inference.ipynb or

Generating Programs without ChatGPT API

We've released a fine-tuned (on ChatGPT outputs) LLama2 7B model. If you do not want to use ChatGPT then you can use this model instead. Please refer to this code file.

Citation

If you find our project useful in your research, please cite the following paper:

@inproceedings{Cho2023VPT2I,
  author    = {Jaemin Cho and Abhay Zala and Mohit Bansal},
  title     = {Visual Programming for Text-to-Image Generation and Evaluation},
  booktitle = {NeurIPS},
  year      = {2023},
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
data		data
scripts		scripts
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
inference.ipynb		inference.ipynb
requirements.txt		requirements.txt
requirements_2.txt		requirements_2.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

data

data

scripts

scripts

src

src

.gitignore

.gitignore

CHANGELOG.md

CHANGELOG.md

LICENSE

LICENSE

README.md

README.md

inference.ipynb

inference.ipynb

requirements.txt

requirements.txt

requirements_2.txt

requirements_2.txt

Repository files navigation

VPEval

Change Log

Code Structure

Setup Environment

Running Evaluation

Explaination Outputs

Running End-to-End Inference

Generating Programs without ChatGPT API

Citation

About

Languages

License

aszala/VPEval

Folders and files

Latest commit

History

Repository files navigation

VPEval

Change Log

Code Structure

Setup Environment

Running Evaluation

Explaination Outputs

Running End-to-End Inference

Generating Programs without ChatGPT API

Citation

About

Resources

License

Stars

Watchers

Forks

Languages