Skip to content

DavidK0/SUTS-for-VLMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SUTS-for-CVLMs

This repository contains a spatial understanding (SU) test suite (TS) for vision-language models (VLMs) in fulfillment of the project option for the CLMS degree from the University of Washington. Read the paper at docs/Final Paper.pdf

The test suite consists of pairs of true and false sentences which truly or falsely describe a caption. THe goal of the VLM is to identify the true caption. By using different sentence structure, I can test what lingustic features affect the performance of VLM.

The two steps to creating this data set were:

  • Synthetic Image Generation: I used Unity to generate images (not included in this repo).
  • Sentence Generation: From the spatial relation metadata associated with the images made in the previous step, I use python to generate a set of true and false sentences.

I tested CLIP's performance on this test suite, and it performed very poorly, at or worse than random guessing.

More information about this test suite can be found at the the wiki.

About

This repository contains a spatial understanding test suite for vision-language models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages