physical-commonsense

This is the code, data, and website repository accompanying the paper:

Do Neural Language Representations Learn Physical Commonsense?
Maxwell Forbes, Ari Holtzman, Yejin Choi
CogSci 2019

For an overview of the project and academic publication, please see the project webpage. The rest of this README is focused on the code and data behind the project.

Setup

# (1) Create a fresh virtualenv. Use Python 3.7+

# (2) Install pytorch. (This code was written using Pytorch 1.1) Follow directions at
# https://pytorch.org/.

# (3) Install other Python dependencies using pip:
pip install -r requirements.txt

# (4) Retrieve external data. (Our data is already in subfolders of data/; this is for
# larger blobs like GloVe.) This script also makes some directories we'll need.
./scripts/get_data.sh

Run

# Note that per-datum results for the programs below are written to data/results/

# Run the baselines: random and majority.
python -m pc.baselines

# Run GloVe, Dependency Embeddings, and ELMo. For detailed info on hyperparameters and
# cross validation, see the `main()` function in pc/models.py
python -m pc.experiments

# Run BERT. NOTE: 1 epoch for "situated-AP" is not to handicap the model; rather, it
# overfits and achieves 0.0 F1 score for epoch 2+.
python -m pc.bert --task "abstract-OP"
python -m pc.bert --task "situated-OP"
python -m pc.bert --task "situated-OA"
python -m pc.bert --task "situated-AP" --epochs 1

# Display human results.
python -m pc.human

# Compute statistical significance. (Requires all baselines and model output.)
python -m pc.significance

# Convert BERT's output on the situated-AP task to per-category output (for making
# graphs).
python -m scripts.perdatum_to_category

# Produce graphs (from paper) for analyzing BERT's output on situated-AP task per-
# category, as well as comparing performance vs word occurrence in natural language
# (found in data/nl/). Writes graphs to data/results/graphs.
python -m pc.graph

Data

In this repository, we provide the abstract and stitauted datasets collected for this project, as well as some auxillary data we used (sentence constructions, statistics of natural language).

Note that the scripts/get_data.sh script will download additional data (GloVe embeddings, Dependency Embeddings, and a cache of ELMo embeddings), which we don't describe here.

Here is an overview of what's provided:

data/
├── dep-embs  (retrieve with `scripts/get_data.sh`)
├── elmo      (retrieve with `scripts/get_data.sh`)
├── glove     (retrieve with `scripts/get_data.sh`)
├── human     Expert annotations establishing human performance on the task.
├── nl        Natural language statistics: frequency of our words in a large corpus.
├── pc        Our abstract and situated datasets are here.
└── sentences Sentences automatically constructed for contextual models (ELMo, BERT).

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
docs		docs
pc		pc
scripts		scripts
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

docs

docs

pc

pc

scripts

scripts

.gitignore

.gitignore

.travis.yml

.travis.yml

LICENSE.txt

LICENSE.txt

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

physical-commonsense

Setup

Run

Data

About

Releases

Packages

Languages

License

mbforbes/physical-commonsense

Folders and files

Latest commit

History

Repository files navigation

physical-commonsense

Setup

Run

Data

About

Resources

License

Stars

Watchers

Forks

Languages