Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP

This repository is the official implementation of Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP. Please cite arxiv or Neurips 2021 version

The dataset is also available at https://doi.org/10.5061/dryad.n02v6wwzp

Requirements

This will enable you to download and replicate the datasplits, but it has not been updated to include all requirements to run the (baselines and experiments notebooks).

pip install -r requirements.txt

Preparing data

git clone <anonymized>  # if using code supplement, just unzip
cd decrypt
pushd ./data && unzip "*.json.zip" && popd

Download data (can safely be ignored)

If you want to download the data yourself from the web (you probably don't want to)

git clone <anonymized>  # if using code supplement, just unzip
cd decrypt
mkdir -p './data/puzzles'
python decrypt/scrape_parse/guardian_scrape.py --save_directory="./data/puzzles"

Then when you run load_guardian_splits you will run load_guardian_splits("./data/puzzles", load_from_files=True, use_premade_json=False)

Reproducing our splits

from decrypt.scrape_parse import (
  load_guardian_splits,               # naive random split
  load_guardian_splits_disjoint,      # answer-disjoint split
  load_guardian_splits_disjoint_hash  # word-initial disjoint split
)
from decrypt.scrape_parse.guardian_load import SplitReturn
"""
each of these methods returns a tuple of `SplitReturn`
- soln to clue map (string to List of clues mapping to that soln): Dict[str, List[BaseClue]
this enables seeing all clues associated with a given answer word
- list of all clues (List[BaseClue])
- Tuple of three lists (the train, val, test splits), each is List[BaseClue]

Note that
load_guardian_splits() will verify that
- total glob length matches the one in paper (ie. number of puzzles downloaded matches)
- total clue set length matches the one in paper (i.e. filtering is the same)
- one of the clues in our train set matches our train set (i.e. a single clue
spot check for randomness)
If you get an assertion error or an exception during load, please file an
issue, since the splits should be identical
Alternatively, if you don't care, you can pass `verify=False` to
`load_guardian_splits`
"""

soln_to_clue_map, all_clues_list, (train, val, test) = load_guardian_splits()

Replicating our work

We make code available to replicate the entire paper.

Note that the directory structure is specified in decrypt/config.py. You can change it if you would like. Most references use this file, but run commands (i.e. python ... assume that the directories are unchanged from the original config.py.

Datasets and task (Section 3)

The splits are replicated as above using the load methods
The task is replicated in the following sections
We provide code to replicate metric analysis. See the implementation in jupyter notebooks below

To run the notebooks, you should start your jupyter server from the top level decrypt directory. The notebooks have been run using pycharm open from the top level decrypt directory. If you experience import errors it is likely because you are not running from the top level.

Baselines (Section 4)

Notebook to replicate the four baselines are in baselines directory. Note that a patch will need to be applied to work with the deits solver.

Curriculum Learning (Section 5)

See experiments/curricular.ipynb

Model Analysis

See experiments/model_analysis

Misc

Note that details of training and evaluating the models are available in the relevant jupyter notebooks.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
baselines		baselines
data		data
decrypt		decrypt
experiments		experiments
seq2seq		seq2seq
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

baselines

baselines

data

data

decrypt

decrypt

experiments

experiments

seq2seq

seq2seq

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP

Requirements

Preparing data

Download data (can safely be ignored)

Reproducing our splits

Replicating our work

Datasets and task (Section 3)

Baselines (Section 4)

Curriculum Learning (Section 5)

Model Analysis

Misc

About

Releases

Packages

Languages

License

jsrozner/decrypt

Folders and files

Latest commit

History

Repository files navigation

Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP

Requirements

Preparing data

Download data (can safely be ignored)

Reproducing our splits

Replicating our work

Datasets and task (Section 3)

Baselines (Section 4)

Curriculum Learning (Section 5)

Model Analysis

Misc

About

Resources

License

Stars

Watchers

Forks

Languages