Skip to content

ZhengxiangShi/StepGame

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts


  • [News - 4 March 2024] Please use the latest version of the dataset from HuggingFace. You can download the dataset from here.
  • [News - 4 Feb 2024] We correct some typos in Code/template.py for dataset generation. We re-generate the dataset and update the dataset in the Dataset folder. Please download the latest version of the dataset.

Quick Links

Description

Here are the dataset and codes for the AAAI 2022 paper "StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts".

Our Contributions

  • A new benchmark for multi-hop spatial reasoning in texts.
  • A memory augmented neural network for multi-hop spatial reasoning.

Usage from HuggingFace Dataset

You can use the StepGame dataset from HuggingFace. Here is an example of how to load the StepGame dataset from the HuggingFace Dataset.

from datasets import load_dataset
dataset = load_dataset("michaelszx/StepGame")
print(dataset)

You will get the results as follows:

DatasetDict({
    train: Dataset({
        features: ['story', 'question', 'label', 'k_hop'],
        num_rows: 50000
    })
    validation: Dataset({
        features: ['story', 'question', 'label', 'k_hop'],
        num_rows: 5000
    })
    test: Dataset({
        features: ['story', 'question', 'label', 'k_hop'],
        num_rows: 100000
    })
})

StepGame Dataset

Version

We offer two variations of our StepGame dataset, both of which are identical in content except for the size of samples.

  • TrainVersion: The "TrainVersion" is utilised in the baseline models presented in our paper. The models are trained on clean samples with varying levels of noise (k=1,2,3,4,5), where each value of k has 10,000 and 1,000 samples for training and validation, respectively. Thus, the total number of training samples is 50,000. During the testing phase, noise samples with k=1,2,3,4,5,6,7,8,9,10 are used. All of these samples are consolidated in the "TrainVersion" folder.

  • CompleteVersion: The "CompleteVersion" is an extension of the "TrainVersion" that we have provided to enable future research on spatial reasoning over texts. This extended version includes more samples and contains both clean and noise data. In this version, each part (clean and noise) consists of 30,000, 1,000, and 30,000 samples for training, validation, and testing, respectively, for values of k ranging from 1 to 10. All of these samples are consolidated in the "CompleteVersion" folder.

Generate more samples

You can produce additional samples by executing the following command:

cd Code
python parameterized_step_game_8relation.py --seed 123

Bugs or questions?

If you have any inquiries pertaining to the code or the paper, please do not hesitate to contact Zhengxiang (zhengxiang.shi.19@ucl.ac.uk). In case you encounter any issues while utilizing the code or wish to report a bug, you may open an issue. We kindly request that you provide specific details regarding the problem so that we can offer prompt and efficient assistance.

Citation

@inproceedings{stepGame2022shi,
title={StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts},
author={Shi, Zhengxiang and Zhang, Qiang and Lipani, Aldo},
volume={36},
url={https://ojs.aaai.org/index.php/AAAI/article/view/21383},
DOI={10.1609/aaai.v36i10.21383}, 
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2022},
month={Jun.},
pages={11321-11329}
}

About

[AAAI 2022] Dataset and pytorch codes for the paper titled "StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts" in AAAI 2022 (Oral)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published