Skip to content

jsrozner/decrypt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP

This repository is the official implementation of Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP. Please cite arxiv or Neurips 2021 version

The dataset is also available at https://doi.org/10.5061/dryad.n02v6wwzp

Requirements

This will enable you to download and replicate the datasplits, but it has not been updated to include all requirements to run the (baselines and experiments notebooks).

pip install -r requirements.txt

Preparing data

git clone <anonymized>  # if using code supplement, just unzip
cd decrypt
pushd ./data && unzip "*.json.zip" && popd

Download data (can safely be ignored)

If you want to download the data yourself from the web (you probably don't want to)

git clone <anonymized>  # if using code supplement, just unzip
cd decrypt
mkdir -p './data/puzzles'
python decrypt/scrape_parse/guardian_scrape.py --save_directory="./data/puzzles"

Then when you run load_guardian_splits you will run load_guardian_splits("./data/puzzles", load_from_files=True, use_premade_json=False)

Reproducing our splits

from decrypt.scrape_parse import (
  load_guardian_splits,               # naive random split
  load_guardian_splits_disjoint,      # answer-disjoint split
  load_guardian_splits_disjoint_hash  # word-initial disjoint split
)
from decrypt.scrape_parse.guardian_load import SplitReturn
"""
each of these methods returns a tuple of `SplitReturn`
- soln to clue map (string to List of clues mapping to that soln): Dict[str, List[BaseClue]
this enables seeing all clues associated with a given answer word
- list of all clues (List[BaseClue])
- Tuple of three lists (the train, val, test splits), each is List[BaseClue]

Note that
load_guardian_splits() will verify that
- total glob length matches the one in paper (ie. number of puzzles downloaded matches)
- total clue set length matches the one in paper (i.e. filtering is the same)
- one of the clues in our train set matches our train set (i.e. a single clue
spot check for randomness)
If you get an assertion error or an exception during load, please file an
issue, since the splits should be identical
Alternatively, if you don't care, you can pass `verify=False` to
`load_guardian_splits`
"""

soln_to_clue_map, all_clues_list, (train, val, test) = load_guardian_splits()

Replicating our work

We make code available to replicate the entire paper.

Note that the directory structure is specified in decrypt/config.py. You can change it if you would like. Most references use this file, but run commands (i.e. python ... assume that the directories are unchanged from the original config.py.

Datasets and task (Section 3)

  • The splits are replicated as above using the load methods
  • The task is replicated in the following sections
  • We provide code to replicate metric analysis. See the implementation in jupyter notebooks below

To run the notebooks, you should start your jupyter server from the top level decrypt directory. The notebooks have been run using pycharm open from the top level decrypt directory. If you experience import errors it is likely because you are not running from the top level.

Baselines (Section 4)

Notebook to replicate the four baselines are in baselines directory. Note that a patch will need to be applied to work with the deits solver.

Curriculum Learning (Section 5)

See experiments/curricular.ipynb

Model Analysis

See experiments/model_analysis

Misc

Note that details of training and evaluating the models are available in the relevant jupyter notebooks.

About

Repository for paper Decrypting Cryptic Crosswords

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published