Summary

Summary task in Vietnamese applies seq2seq model. Thanks to the SOTA Roberta model in Vietnamese, PhoBERT, I made summarization architecture which is trained on Vietnews dataset (reference 1)

Demo

Step 1: Build docker container

docker build -f Dockerfile -t nlp-text-summarization:latest .

Step 2: Run docker container

docker run -p 8501:8501 nlp-text-summarization:latest

Results

The model outperforms the recent research paper on Vietnamese text summarization on the same dataset.

Attempt	Precision	Recall	F1-Score	F1-Score Fast-Abs (Ref 1)
Rouge 1	0.64	0.61	0.61	0.55
Rouge 2	0.31	0.30	0.30	0.23
Rouge L	0.42	0.41	0.40	0.38

Reference

Nguyen, Van-Hau & Nguyen, Thanh-Chinh & Nguyen, Minh-Tien & Hoai, Nguyen. (2019). VNDS: A Vietnamese Dataset for Summarization. 375-380. 10.1109/NICS48868.2019.9023886.
Rothe, Sascha & Narayan, Shashi & Severyn, Aliaksei. (2020). Leveraging Pre-trained Checkpoints for Sequence Generation Tasks. Transactions of the Association for Computational Linguistics. 8. 264-280. 10.1162/tacl_a_00313.
Nguyen, Dat Quoc & Nguyen, Anh. (2020). PhoBERT: Pre-trained language models for Vietnamese. 1037-1042. 10.18653/v1/2020.findings-emnlp.92.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
eda		eda
inference		inference
notebooks		notebooks
vncorenlp		vncorenlp
Dockerfile		Dockerfile
README.md		README.md
config.yaml		config.yaml
demo.py		demo.py
demo.sh		demo.sh
demo_utils.py		demo_utils.py
general_utils.py		general_utils.py
jdk.sh		jdk.sh
log.md		log.md
requirements.sh		requirements.sh
requirements.txt		requirements.txt
seq2seq_trainer.py		seq2seq_trainer.py
test.py		test.py
train.py		train.py

ngockhanh5110/nlp-vietnamese-text-summarization

Folders and files

Latest commit

History

Repository files navigation

Summary

Demo

Results

Reference

About

Topics

Resources

Stars

Watchers

Forks

Languages