Skip to content

TOSHISTATS/Classification-of-Japanese-news-with-BERT

Repository files navigation

Classification of Japanese news with BERT_multi

BERT, or Bidirectional Encoder Representations from Transformers by Google, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks.

The academic paper by Google which describes BERT in detail and provides full results on a number of tasks can be found here: https://arxiv.org/abs/1810.04805.

I use “livedoor news corpus” (1) for this experiment. The details of the experiment is explained in this blog. https://toshistats.wordpress.com/2019/04/30/bert-performs-very-well-in-japanese-in-our-experiment/

Evaluation results

test_accuracy = 0.8744,

finetuned with data of livedoor news corpus (training 3153 samples, test 826 samples)

(1) livedoor news corpus CC BY-ND 2.1 JP https://creativecommons.org/licenses/by-nd/2.1/jp/

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software

About

日本語ニュースタイトルの判別-BERT&TensorFlow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published