elmo4irony: Deep contextualized word representations for detecting sarcasm and irony

This repo contains the official implementation of the paper "Deep contextualized word representations for detecting sarcasm and irony" WASSA 2018.

Main Requirements

Python 3
Pytorch 0.4.0
conda

Installation

Clone this repo in your home directory

git clone https://github.com/epochx/elmo4irony-dev
cd elmo4irony

Create a conda environment. If you don't have conda installed, we recommend using miniconda. You can then easily create and activate a new conda environment with Python 3.6 by executing:
```
conda create -n elmo4irony python=3.6
conda activate elmo4irony
```
Run the installation script
```
install/install.sh
```

If you want to clone the repo in a different place than your home directory, please check that the paths for the needed directories are properly set in config.py.

Data preparation

Download the data (by default, to ~/data/elmo4irony/corpus), by executing:
```
download/download.sh
```
Some data need to be downloaded through the Twitter API. In order to do so you need to apply for a Twitter developer account in the following link:

https://developer.twitter.com/en/apply/user

Once you do so and create an app, fill the .twitter_credentials.conf file with the consumer_key, consumer_secret, access_token_key, and access_token_secret details.

Note 1: Downloading all the Twitter data will take around 24 hours.

Note 2: During the download process the script will sleep due to some of the API's restrictions.
Prepare the data
```
prepare/prepare.sh
```
Preprocess the data
```
./preprocess.sh
```

Finally, to test if you installed everything correctly, run:

python run.py --help

Training

Run:

python run.py --corpus <corpus> --write_mode BOTH

to train a model with the default hyperparameters on the given <corpus>, and store the output results on disk. Checkpoints and other output files are saved in a directory named after the hash of the current run in ~data/elmo4irony/results/.

The hash will depend on hyperparameters that impact performance. For example, changing learning_rate, lstm_hidden_size, dropout, would produce different hashes, whereas changing write_mode, or save_model or similars, would not.

Testing

To evaluate a trained model on the test set, run:

python run.py --corpus <corpus> --model_hash=<partial_model_hash> --test

Where you have to replace <partial_model_hash> by the hash of the model you wish to test, corresponding to the name of its directory located in ~data/elmo4irony/results/. A classification report will be printed on screen, and files containing the test prediction labels and probabilities ( predictions.txt and test_probs.csv respectively )will be created the model directory. Once you've run this, to obtain a more detailed output, you can can also try:

python evaluate.py --corpus <corpus> --predictions /path/to/predictions.txt

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
download		download
install		install
prepare		prepare
src		src
.gitignore		.gitignore
.twitter_credentials.conf		.twitter_credentials.conf
README.md		README.md
augment_all.sh		augment_all.sh
augment_prepared.py		augment_prepared.py
base_args.py		base_args.py
ensemble.py		ensemble.py
eval_sarc_v2.py		eval_sarc_v2.py
evaluate.py		evaluate.py
preprocess.py		preprocess.py
preprocess.sh		preprocess.sh
proprocess_all_augmented.sh		proprocess_all_augmented.sh
run.py		run.py
twokenize.py		twokenize.py
unicode_codes_py3.py		unicode_codes_py3.py

epochx/elmo4irony

Folders and files

Latest commit

History

Repository files navigation

elmo4irony: Deep contextualized word representations for detecting sarcasm and irony

Main Requirements

Installation

Data preparation

Training

Testing

About

Resources

Stars

Watchers

Forks

Languages