Skip to content

PyThaiNLP/classification-benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 

Repository files navigation

Thai Text Classification Benchmarks

We provide 4 datasets for Thai text classification in different styles, objectives, and number of labels. We also created some preliminary benchmarks using fastText, linear models (linearSVC and logistic regression), and thai2fit's implementation of ULMFit.

prachathai-67k, truevoice-intent, and all code in this repository are released under Apache License 2.0 by pyThaiNLP. wisesight-sentiment is released to public domain, using Creative Commons Zero v1.0 Universal license, by Wisesight. wongnai-corpus is released under GNU Lesser General Public License v3.0 by Wongnai.

Dataset Description

Datasets Style Objective Labels Size
prachathai-67k: body_text Formal (online newspapers), News Topic 12 67k
truevoice-intent: destination Informal (call center transcription), Customer service Intent 7 16k
wisesight-sentiment Informal (social media), Conversation/opinion Sentiment 4 28k
wongnai-corpus Informal (review site), Restuarant review Sentiment 5 40k

prachathai-67k: body_text

We benchmark prachathai-67k by using body_text as text features and construct a 12-label multi-label classification. The performance is measured by macro-averaged accuracy and F1 score. Codes can be run to confirm performance at this notebook. We also provide performance metrics by class in the notebook.

model macro-accuracy macro-F1
fastText 0.9302 0.5529
LinearSVC 0.513277 0.552801
ULMFit 0.948737 0.744875
USE 0.856091 0.696172

truevoice-intent: destination

We benchmark truevoice-intent by using destination as target and construct a 7-class multi-class classification. The performance is measured by micro-averaged and macro-averaged accuracy and F1 score. Codes can be run to confirm performance at this notebook. We also provide performance metrics by class in the notebook.

model macro-accuracy micro-accuracy macro-F1 micro-F1
LinearSVC 0.957806 0.95747712 0.869411 0.85116993
ULMFit 0.955066 0.84273111 0.852149 0.84273111
BERT 0.8921 0.85 0.87 0.85
USE 0.943559 0.94355855 0.787686 0.802455

Performance of wisesight-sentiment is based on the test set of WISESIGHT Sentiment Analysis. Codes can be run to confirm performance at this notebook.

Disclaimer Note that the labels are obtained manually and are prone to errors so if you are planning to apply the models in the benchmark for real-world applications, be sure to benchmark it with your own dataset.

Model Public Accuracy Private Accuracy
Logistic Regression 0.72781 0.7499
FastText 0.63144 0.6131
ULMFit 0.71259 0.74194
ULMFit Semi-supervised 0.73119 0.75859
ULMFit Semi-supervised Repeated One Time 0.73372 0.75968
USE 0.63987*
  • Done after competition with a test set that was cleaned from 3946 rows to 2674 rows

Performance of wongnai-corpus is based on the test set of Wongnai Challenge: Review Rating Prediction. Codes can be run to confirm performance at this notebook.

Model Public Micro-F1 Private Micro-F1
ULMFit Knight 0.61109 0.62580
ULMFit 0.59313 0.60322
fastText 0.5145 0.5109
LinearSVC 0.5022 0.4976
Kaggle Score 0.59139 0.58139
BERT 0.56612 0.57057
USE 0.42688 0.41031

BibTeX

@software{cstorm125_2020_3852912,
  author       = {cstorm125 and
                  lukkiddd},
  title        = {PyThaiNLP/classification-benchmarks: v0.1-alpha},
  month        = may,
  year         = 2020,
  publisher    = {Zenodo},
  version      = {v0.1-alpha},
  doi          = {10.5281/zenodo.3852912},
  url          = {https://doi.org/10.5281/zenodo.3852912}
}

Acknowledgements