Text Classification using Pre-trained BERT Vectors

Program performs text classification of following 10 classes using BERT vector:

Arabic
Cantonese
Japanese
Korean
Mandarin
Polish
Russian
Spanish
Thai
Vietnamese

Program was implemented using Python and BERT. Refer the report for further implementation details, and instructions to run the code: View Report

Results:

Logistic Regression Model Predictions: Among all languages, highest precision, recall, and f1-score is for Thai, whereas lowest is for Mandarin. Misclassification is highest for Mandarin and Cantonese, whereas lowest for Thai.

Neural Network Model Predictions: Using MLP Classifier, highest precision, recall, and f1-score is for Thai, whereas lowest is for Mandarin. Misclassification is highest for Mandarin, whereas lowest for Thai.

Improvements:

The logistic regression model can be improved by hyperparameter tuning by grid search. The neutral network model can be improved by using hyperparameter optimization tools on parameters like hidden_layer_sizes, activation, solver, alpha, learning_rate, max_iter, etc. Use BERT vectors and more data to train the models in order to see improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
uncased_L-12_H-768_A-12		uncased_L-12_H-768_A-12
LICENSE		LICENSE
README.md		README.md
bert.zip		bert.zip
bert_report_chandni.pdf		bert_report_chandni.pdf
bert_vectors_chandni.html		bert_vectors_chandni.html
bert_vectors_chandni.ipynb		bert_vectors_chandni.ipynb
bert_vectors_chandni.py		bert_vectors_chandni.py
format_chandni.sh		format_chandni.sh
run_bert_fv.sh		run_bert_fv.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

uncased_L-12_H-768_A-12

uncased_L-12_H-768_A-12

LICENSE

LICENSE

README.md

README.md

bert.zip

bert.zip

bert_report_chandni.pdf

bert_report_chandni.pdf

bert_vectors_chandni.html

bert_vectors_chandni.html

bert_vectors_chandni.ipynb

bert_vectors_chandni.ipynb

bert_vectors_chandni.py

bert_vectors_chandni.py

format_chandni.sh

format_chandni.sh

run_bert_fv.sh

run_bert_fv.sh

Repository files navigation

Text Classification using Pre-trained BERT Vectors

Results:

Improvements:

About

Releases

Packages

Contributors 2

Languages

License

chandnii7/Text-Classification-BERT

Folders and files

Latest commit

History

Repository files navigation

Text Classification using Pre-trained BERT Vectors

Results:

Improvements:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages