Spatial Commonsense

Source code and data for Things not Written in Text: Exploring Spatial Commonsense from Visual Signals (ACL2022 main conference paper).

Dependencies

Python>=3.7

For pre-trained language model probing:

Transformers
Pytorch
Sklearn

For image synthesis:

Torchvision
Kornia
CLIP
Taming-transformers

For object detection and vision-language model:

Scene_graph_benchmark
Oscar

Data

Our datasets are in the data/ folder.

Size/Height: The objects, text prompts, questions, and lables are in data.json. There is an additional pickle file containing the objects in levels.

PosRel: The objects, text prompts and labels for probing are in data.json. The questions and answers are in data_qa.json.

Code

The code is in the code/ folder.

Image Synthesis

The image synthesis code is adapted from code of Ryan Murdoch, @advadnoun on Twitter.

python image_synthesis.py

Variables clip_path and taming_path need to be modified before execution.

Images are generated in data/{size, height, posrel}/images. ({size, height, posrel} means one of the three words based on the current subtask.)

Object Detection

Scene_graph_benchmark (VinVL) does not provide code for object detection from custom images directly.

We first modify scene_graph_benchmark/tools/mini_tsv/tsv_demo.py to generate tsv files for our image directory, and run

python tools/test_sg_net.py --config-file sgg_configs/vgattr/vinvl_x152c4.yaml TEST.IMS_PER_BATCH 2 MODEL.WEIGHT models/vinvl/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 DATA_DIR "tools/mini_tsv/{size, height, posrel}" TEST.IGNORE_BOX_REGRESSION True MODEL.ATTRIBUTE_ON True

The object detection results are outputed in predictions.tsv, and features of bounding boxes are in feature.tsv.

Probing Spatial Commonsense

(For Size/Height) Make the depth prediction for each image:

python depth_prediction.py

Image synthesis model probing with bounding boxes in the images:

python image_probing_box.py

Solving Natural Language Questions

Reasoning based on the generated images:

Generate files required by Oscar+.

python build_oscar_data.py

Create directories {size, height, posrel} under Oscar/vinvl/datasets, and then place oscar_data.json and feats.pt under it.

Place run_vqa.py in Oscar/oscar, and run:

python oscar/run_vqa.py -j 4 --img_feature_dim 2054 --max_img_seq_length 50 --data_label_type mask --img_feature_type faster_r-cnn --data_dir vinvl/datasets/{size, height, posrel}/  --model_type bert --model_name_or_path best/best  --task_name vqa_text --do_train --do_lower_case --max_seq_length 128 --per_gpu_eval_batch_size 256 --per_gpu_train_batch_size 32 --learning_rate 5e-05 --num_train_epochs 25 --output_dir results --label_file vinvl/datasets/vqa/vqa/trainval_ans2label.pkl --save_epoch 1 --seed 88 --evaluate_during_training --logging_steps 4000 --drop_out 0.3 --weight_decay 0.05 --warmup_steps 0 --loss_type bce --img_feat_format pt --classifier linear --cls_hidden_scale 3 --txt_data_dir vinvl/datasets/{size, height, posrel}

Citation

Please cite our paper if this repository inspires your work.

@inproceedings{liu2022things,
  title={Things not Written in Text: Exploring Spatial Commonsense from Visual Signals},
  author={Liu, Xiao and Yin, Da and Feng, Yansong and Zhao, Dongyan},
  booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={2365--2376},
  year={2022}
}

Contact

If you have any questions regarding the code, please create an issue or contact the owner of this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
code		code
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

README.md

README.md

Repository files navigation

Spatial Commonsense

Dependencies

Data

Code

Image Synthesis

Object Detection

Probing Spatial Commonsense

Solving Natural Language Questions

Citation

Contact

About

Releases

Packages

Languages

xxxiaol/spatial-commonsense

Folders and files

Latest commit

History

Repository files navigation

Spatial Commonsense

Dependencies

Data

Code

Image Synthesis

Object Detection

Probing Spatial Commonsense

Solving Natural Language Questions

Citation

Contact

About

Resources

Stars

Watchers

Forks

Languages