Multimodal-Probes

Showing how the dependency tree and structure for visual regions can match.

In this project we created a new method for probing the embeddings for multimodal-BERT models. Furthermore, we provide a mapping to create the novel Scene Tree structure over the image regions. It is created by using the dependency tree of a caption and imposing it on the attached regions.

Probing • Scene Tree • Repository Layout • Install • How to Run • Citation

Probing

This repository is build as a modular probing project. By running the main file, you get access to all the possible datasets and probes. While the project is called Multimodal-Probes, it is still possible to train and use probes on uni-modal data or uni-modal models. Simply run python main.py --help to get a list of options. Initialy, this is a general set of options. Once you start setting some options (like the --task), you get more specific options to see if the probe works with the current task.

Scene Tree

The scene tree is a novel structure that creates a hierarchical structure over the regions in an image. For the construction, we assume to have images with captions, where words/phrases are aligned with regions in the image (e.g. Flickr30K Entities).

A dependency structure is extracted for the caption using the spacy parser, and we map this structure on top of the connected regions in the image. This results in the scene tree.

Most of needed methods for creating the scene tree are contained in the probing_project/scene_tree.py file. The remaining methods are imported from probing_project/utils.py, probing_project/tasks/depth_task.py, and probing_project/tasks/depth_task.py.

Repository Layout

main.py: Handles the entire probing program
data:
- raw: downloads of all needed datasets
- intermediate: all processed data not needed for main run
- processed: all finished data needed for main run
- README.md: Describes the data sources used in the paper and how to prepare the image region features
scripts:
- extract_flickr_30k_images.py: script for extracting image region features
probing_project:
- data:
  - datasets: dataset specific preprocessing and loading methods
  - modules: pytorch-lightning DataModule type classes for preprocessing and loading the data
  - probing_dataset.py: the project pytorch dataset-type class
  - utils.py: utilities needed for the dataset class
- embedding_mappings: processing of the embeddings before probing (DiskMapping does nothing)
- losses: additional losses for some tasks
- probes: the torch.nn.module functions for the probes
- reporters: the classes for computing metrics and results
- tasks: the possible task to probe on, each task should have one or more accompanying probes
- constants.py: General information needed, i.e. volta configs, optional settings for help message
- model.py: the main pytorch-lightning LightningModule-type class
- scene_tree.py: the specific files for generating the scene tree
- utils.py: extra utility functions
volta: CREATE MANUALLY a clone of the volta repository, needed for running with the multimodal-BERT models

Install

First clone and install dependencies

# clone project
git clone https://github.com/VSJMilewski/multimodal-probes

# install project
cd multimodal-probes

Manually install the pytorch following their get started page. We used version 1.10.1.

Next, install the other requirements.

pip install -r requirements.txt

If you want to use the multimodal models as used in the paper (and currently the only setup in the code), clone the Volta Library into the root directory (or install it somewhere else and use a symbolic link).

How to Run

Simply run the main with python main.py file and set the needed options.

Minimum required options to set is:

--task
--probe
--dataset
--embeddings_model

An example run:

# run module
python main.py --task DepthTask --dataset Flickr30k  --probe OneWordPSDProbe --embeddings_model ViLBERT

Use the --help flag to see a full set of options:

python main.py --help

Depending on which required arguments you have set already, the help output changes to show available options with those settings

Credits

Initial code for probes, evaluations, losses, and some of the data processing was taken from the Structural-Probes Project by Hewitt and Manning (2019).

Citation

If you use this repository, please cite:

@inproceedings{milewski-etal-2022-finding,
    title = "Finding Structural Knowledge in Multimodal-{BERT}",
    author = "Milewski, Victor  and
      de Lhoneux, Miryam  and
      Moens, Marie-Francine",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.388",
    pages = "5658--5671",
    abstract = "In this work, we investigate the knowledge learned in the embeddings of multimodal-BERT models. More specifically, we probe their capabilities of storing the grammatical structure of linguistic data and the structure learned over objects in visual data. To reach that goal, we first make the inherent structure of language and visuals explicit by a dependency parse of the sentences that describe the image and by the dependencies between the object regions in the image, respectively. We call this explicit visual structure the scene tree, that is based on the dependency tree of the language description. Extensive probing experiments show that the multimodal-BERT models do not encode these scene trees.",
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
data		data
doc-assets		doc-assets
probing_project		probing_project
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

data

data

doc-assets

doc-assets

probing_project

probing_project

scripts

scripts

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

main.py

main.py

requirements.txt

requirements.txt

Repository files navigation

Multimodal-Probes

Probing

Scene Tree

Repository Layout

Install

How to Run

Credits

Citation

About

Releases

Packages

Languages

License

VSJMilewski/multimodal-probes

Folders and files

Latest commit

History

Repository files navigation

Multimodal-Probes

Probing

Scene Tree

Repository Layout

Install

How to Run

Credits

Citation

About

Resources

License

Stars

Watchers

Forks

Languages