Skip to content

lrank/Linguistic_adversity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Linuistic Adversity

Introduction

This repository is an implementation of the following work:

Li, Yitong , Trevor Cohn and Timothy Baldwin (2017) Robust Training under Linguistic Adversity, In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), Valencia, Spain.

Respository Structure

In this repository, files are separated into three fold:

  • data
  • noise generators
  • text_cnn : the convolutional neural network model for sentence-level classification tasks.

Details are following:

data/

This fold contains the original dataset,

  • movie review dataset (Pang and Lee, 2008)
  • customer review dataset (Hu and Liu, 2004)
  • subjectivity dataset (Pang and Lee, 2005)
  • Stanford Sentiment Treenbank (Socher et al., 2013)

For more sentiment analysis dataset can be found at HarvardNLP.

Please cite the original paper when you use data.

noise_generator/

This fold contains four different linguistic noise generators within four sub-folds each.

Please refer to the READMEfile for each noise generator methods to run them.

./WN/

The fold contains the code of semantic noise generator based on Wordnet.

For running the wordnet noise genereator code, you need the following dependencies:

  • NLTK, with following packages downloaded: -- averaged_perceptron_tagger -- punkt -- stopwords -- universal_tagset -- wordnet
  • Numpy
  • kenLM, with pre-built n-gram language model
  • Stanford-ner

./CFit/

Based on idea of Counter-fitting

Dependencies:

  • NLTK
  • Numpy
  • kenLM
  • Stanford-ner
  • pre-trained counter-fitting dictionary

./ERG/

Based on English Resource Grammar (ERG) system, and ACE. Dependencies:

  • ERG ACE

./Comp/

Based on sentence compression method.

text_cnn/

A convolutional neural network model for text classification tasks. The model is based on YoonKim's Convolutional Neural Networks for Sentence Classification and Denny Britz's implementation (https://github.com/dennybritz/cnn-text-classification-tf). Notice that the code has been implemented and tested with Tensorflow r1.0 and python 2.7, which may not be able to run on other version.

requirements

For run the cnn code, you need the following dependencies:

  • Python 2.7
  • Numpy
  • Tensorflow r1.0

Running the code

python train.py [parameters]
parameters:
    --dataset
        The training dataset (default:"mr")
    --noise_type
        Type of noise (default:"raw")
    --is_noise_train
        To train on the noisy data (default:False)
    --is_noise_test
        To test on the noisy data (default:False)
    --l2_reg_lambda
        Model L2 regularizaion lambda (default: 0)
    --dropout_keep_prob=1.0
        Model dropout rate (default:0.5)

For example, to train the model with settings:

nice python  train.py --dataset="mr" --is_noise_train=False --is_noise_test=False --noise_type="cf=0.5" --dropout_keep_prob=1.0

Also, refer to run_script_sample_on_subj.sh to get a better sense of training with different noise.

Contact us

Please email us if anything. All comments are welcome.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published