Skip to content

seopbo/nlp_classification

Repository files navigation

NLP paper implementation relevant to classification with PyTorch

The papers were implemented in using korean corpus

Prelimnary & Usage

  • preliminary
pyenv virualenv 3.7.7 nlp
pyenv activate nlp
pip install -r requirements.txt
  • Usage
python build_dataset.py
python build_vocab.py
python train.py # default training parameter
python evaluate.py # defatul evaluation parameter

Single sentence classification (sentiment classification task)

  • Using the Naver sentiment movie corpus v1.0 (a.k.a. nsmc)
  • Configuration
    • conf/model/{type}.json (e.g. type = ["sencnn", "charcnn",...])
    • conf/dataset/nsmc.json
  • Structure
# example: Convolutional_Neural_Networks_for_Sentence_Classification
├── build_dataset.py
├── build_vocab.py
├── conf
│   ├── dataset
│   │   └── nsmc.json
│   └── model
│       └── sencnn.json
├── evaluate.py
├── experiments
│   └── sencnn
│       └── epochs_5_batch_size_256_learning_rate_0.001
├── model
│   ├── data.py
│   ├── __init__.py
│   ├── metric.py
│   ├── net.py
│   ├── ops.py
│   ├── split.py
│   └── utils.py
├── nsmc
│   ├── ratings_test.txt
│   ├── ratings_train.txt
│   ├── test.txt
│   ├── train.txt
│   ├── validation.txt
│   └── vocab.pkl
├── train.py
└── utils.py
Model \ Accuracy Train (120,000) Validation (30,000) Test (50,000) Date
SenCNN 91.95% 86.54% 85.84% 20/05/30
CharCNN 86.29% 81.69% 81.38% 20/05/30
ConvRec 86.23% 82.93% 82.43% 20/05/30
VDCNN 86.59% 84.29% 84.10% 20/05/30
SAN 90.71% 86.70% 86.37% 20/05/30
ETRIBERT 91.12% 89.24% 88.98% 20/05/30
SKTBERT 92.20% 89.08% 88.96% 20/05/30

Pairwise-text-classification (paraphrase detection task)

# example: Siamese_recurrent_architectures_for_learning_sentence_similarity
├── build_dataset.py
├── build_vocab.py
├── conf
│   ├── dataset
│   │   └── qpair.json
│   └── model
│       └── siam.json
├── evaluate.py
├── experiments
│   └── siam
│       └── epochs_5_batch_size_64_learning_rate_0.001
├── model
│   ├── data.py
│   ├── __init__.py
│   ├── metric.py
│   ├── net.py
│   ├── ops.py
│   ├── split.py
│   └── utils.py
├── qpair
│   ├── kor_pair_test.csv
│   ├── kor_pair_train.csv
│   ├── test.txt
│   ├── train.txt
│   ├── validation.txt
│   └── vocab.pkl
├── train.py
└── utils.py
Model \ Accuracy Train (6,136) Validation (682) Test (758) Date
Siam 93.00% 83.13% 83.64% 20/05/30
SAN 89.47% 82.11% 81.53% 20/05/30
Stochastic 89.26% 82.69% 80.07% 20/05/30
ETRIBERT 95.07% 94.42% 94.06% 20/05/30
SKTBERT 95.43% 92.52% 93.93% 20/05/30

Releases

No releases published

Packages

No packages published

Languages