Skip to content
This repository has been archived by the owner on Feb 14, 2023. It is now read-only.

Unofficial code with the paper "On the Role of Text Preprocessing in Neural Network Architectures" for IMDb dataset.

Notifications You must be signed in to change notification settings

NatLee/On-the-Role-of-Text-Preprocessing-in-Neural-Network-Architectures-For-IMDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

On the Role of Text Preprocessing in Neural Network Architectures

An Evaluation Study on Text Categorization and Sentiment Analysis

Jose Camacho Collados and Mohammad Taher Pilehvar

Information

This is UNOFFICIAL code for this paper and only run with IMDb dataset. These files are modified from original code sensecnn for compatibility on Python3 and research.

Pre-trained word embeddings

You can find them from the original authors HERE.

Usage

This repository has already included IMDb dataset. If you use it, please cite the original source.

Please use commandline with following steps.

python prepare_dataset.py

If you have not downloaded pre-trained word embeddings, just run.

python __main__.py IMDb data

Run with pre-trained embeddings.

python __main__.py IMDb data --emb=<YOUR_PRE_TRAIN_WORD2VEC_PATH>

The default settings about training in __main__.py are shown as following.

settings = { 'dict':'data/'+self.dataset+'/'+ self.dataset_id + '.dict.pkl',
                     'data':'data/'+self.dataset+'/'+ self.dataset_id +'.pkl',
                     'filter_length':2,
                     'pool_length':2,
                     'nb_filter':50,
                     'lstm_output_size':25,
                     'batch_size':100,
                     'nb_epoch':100,
                     'folds':10
                    }

Reference paper

If you use any of these resources, please cite the original papers:

@InProceedings{camacho:preprocessing2018,
  author = 	"Camacho-Collados, Jose and Pilehvar, Mohammad Taher",
  title = 	"On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis",
  booktitle = 	"Proceedings of the EMNLP Workshop on Analyzing and interpreting neural networks for NLP",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  location = 	"Brussels, Belgium"
}


@InProceedings{pilehvar-EtAl:2017:Long,
  author    = {Pilehvar, Mohammad Taher  and  Camacho-Collados, Jose  and  Navigli, Roberto  and  Collier, Nigel},
  title     = {Towards a Seamless Integration of Word Senses into Downstream NLP Applications},
  booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  month     = {July},
  year      = {2017},
  address   = {Vancouver, Canada},
  publisher = {Association for Computational Linguistics},
  pages     = {1857--1869},
  url       = {http://aclweb.org/anthology/P17-1170}
}

About

Unofficial code with the paper "On the Role of Text Preprocessing in Neural Network Architectures" for IMDb dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages