Skip to content

VSJMilewski/multimodal-probes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal-Probes

Showing how the dependency tree and structure for visual regions can match.

In this project we created a new method for probing the embeddings for multimodal-BERT models. Furthermore, we provide a mapping to create the novel Scene Tree structure over the image regions. It is created by using the dependency tree of a caption and imposing it on the attached regions.


ProbingScene TreeRepository LayoutInstallHow to RunCitation

super-linter License: MIT arxiv Conference

Probing

This repository is build as a modular probing project. By running the main file, you get access to all the possible datasets and probes. While the project is called Multimodal-Probes, it is still possible to train and use probes on uni-modal data or uni-modal models. Simply run python main.py --help to get a list of options. Initialy, this is a general set of options. Once you start setting some options (like the --task), you get more specific options to see if the probe works with the current task.

Scene Tree

The scene tree is a novel structure that creates a hierarchical structure over the regions in an image. For the construction, we assume to have images with captions, where words/phrases are aligned with regions in the image (e.g. Flickr30K Entities).

A dependency structure is extracted for the caption using the spacy parser, and we map this structure on top of the connected regions in the image. This results in the scene tree.

Most of needed methods for creating the scene tree are contained in the probing_project/scene_tree.py file. The remaining methods are imported from probing_project/utils.py, probing_project/tasks/depth_task.py, and probing_project/tasks/depth_task.py.

Repository Layout

  • main.py: Handles the entire probing program
  • data:
    • raw: downloads of all needed datasets
    • intermediate: all processed data not needed for main run
    • processed: all finished data needed for main run
    • README.md: Describes the data sources used in the paper and how to prepare the image region features
  • scripts:
    • extract_flickr_30k_images.py: script for extracting image region features
  • probing_project:
    • data:
      • datasets: dataset specific preprocessing and loading methods
      • modules: pytorch-lightning DataModule type classes for preprocessing and loading the data
      • probing_dataset.py: the project pytorch dataset-type class
      • utils.py: utilities needed for the dataset class
    • embedding_mappings: processing of the embeddings before probing (DiskMapping does nothing)
    • losses: additional losses for some tasks
    • probes: the torch.nn.module functions for the probes
    • reporters: the classes for computing metrics and results
    • tasks: the possible task to probe on, each task should have one or more accompanying probes
    • constants.py: General information needed, i.e. volta configs, optional settings for help message
    • model.py: the main pytorch-lightning LightningModule-type class
    • scene_tree.py: the specific files for generating the scene tree
    • utils.py: extra utility functions
  • volta: CREATE MANUALLY a clone of the volta repository, needed for running with the multimodal-BERT models

Install

First clone and install dependencies

# clone project
git clone https://github.com/VSJMilewski/multimodal-probes

# install project
cd multimodal-probes

Manually install the pytorch following their get started page. We used version 1.10.1.

Next, install the other requirements.

pip install -r requirements.txt

If you want to use the multimodal models as used in the paper (and currently the only setup in the code), clone the Volta Library into the root directory (or install it somewhere else and use a symbolic link).

How to Run

Simply run the main with python main.py file and set the needed options.

Minimum required options to set is:

  • --task
  • --probe
  • --dataset
  • --embeddings_model

An example run:

# run module
python main.py --task DepthTask --dataset Flickr30k  --probe OneWordPSDProbe --embeddings_model ViLBERT

Use the --help flag to see a full set of options:

python main.py --help

Depending on which required arguments you have set already, the help output changes to show available options with those settings


Credits

Initial code for probes, evaluations, losses, and some of the data processing was taken from the Structural-Probes Project by Hewitt and Manning (2019).


Citation

If you use this repository, please cite:

@inproceedings{milewski-etal-2022-finding,
    title = "Finding Structural Knowledge in Multimodal-{BERT}",
    author = "Milewski, Victor  and
      de Lhoneux, Miryam  and
      Moens, Marie-Francine",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.388",
    pages = "5658--5671",
    abstract = "In this work, we investigate the knowledge learned in the embeddings of multimodal-BERT models. More specifically, we probe their capabilities of storing the grammatical structure of linguistic data and the structure learned over objects in visual data. To reach that goal, we first make the inherent structure of language and visuals explicit by a dependency parse of the sentences that describe the image and by the dependencies between the object regions in the image, respectively. We call this explicit visual structure the scene tree, that is based on the dependency tree of the language description. Extensive probing experiments show that the multimodal-BERT models do not encode these scene trees.",
}

About

Code base for paper "Finding Structural Knowledge in Multimodal-BERT". Framework for probing and code for creating Scene Trees.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages