GitHub - ahwang16/grounded-intuition-gpt-vision: Resources for Grounded Intuition of GPT-Vision's Abilities with Scientific Images

Overview

This is the GitHub repository for my recent article, Grounded Intuition of GPT-Vision's Abilities with Scientific Images.

~~Coming soon: Colab notebook for running GPT-Vision on the API.~~ Now available!

This paper contributes:

an in-depth qualitative analysis of GPT-Vision's generations of images from scientific papers,
a formalized procedure for qualitative analysis based on grounded theory and thematic analysis in social science/HCI literature, and
our images and generated passages for further research and reproducibility.

We used two prompts to generate passages for each image:

Write alt text to describe this <type>.
Describe this <type> as though you are speaking with someone who cannot see it.

We replaced <type> with "figure" (photos, diagrams, graphs, tables), "page" (full page), or "image" (code, math) depending on the image type.

The images can be found in the images directory. Each file is named with the following convention:

<type>_<id>_<short-description>.png

with decimals in image IDs replaced by hyphens. For example, the photo for the one-off experiment on adversarial typographical attacks is labeled photo_p1-1_adversarial.png.

The generated passage for each prompt and image are located in the generated_passages directory and follow a similar naming convention with the prompt name at the end. The prompts for photo_p1-1_adversarial.png can be found in photo_p1-1_adversarial_alt.png and photo_p1-1_adversarial_desc.png.

We're on the news!

As OpenAI's Multimodal API Launches Broadly, Research Shows It's Still Flawed, TechCrunch
ChatGPT-Maker OpenAI Hosts its First Big Tech Showcase as the AI Startup Faces Growing Competition, Associated Press

Suggested citation

If you would like to cite the paper or repository, you can use

@misc{hwang_grounded_2023,
      title={Grounded Intuition of GPT-Vision's Abilities with Scientific Images}, 
      author={Alyssa Hwang and Andrew Head and Chris Callison-Burch},
      year={2023},
      eprint={2311.02069},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
generated_passages		generated_passages
images		images
Describing_Scientific_Images_with_GPT_Vision.ipynb		Describing_Scientific_Images_with_GPT_Vision.ipynb
README.md		README.md
grounded_intuition_github.png		grounded_intuition_github.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generated_passages

generated_passages

images

images

Describing_Scientific_Images_with_GPT_Vision.ipynb

Describing_Scientific_Images_with_GPT_Vision.ipynb

README.md

README.md

grounded_intuition_github.png

grounded_intuition_github.png

Repository files navigation

Overview

We're on the news!

Suggested citation

About

Releases

Packages

Languages

ahwang16/grounded-intuition-gpt-vision

Folders and files

Latest commit

History

Repository files navigation

Overview

We're on the news!

Suggested citation

About

Topics

Resources

Stars

Watchers

Forks

Languages