Skip to content

asajatovic/bayesian-deep-learning-text-classification

Repository files navigation

Bayesian Deep Learning for Text Classification

Source code repository for my Master Thesis at the University of Zagreb, Faculty of Electrical Engineering and Computing.

Bayesian Deep Learning

Bayesian deep learning merges Bayesian probability theory with deep learning, allowing principled uncertainty estimates from deep learning architectures. For an excellent quick introduction to Bayesian deep learning, check out Demystifying Bayesian Deep Learning. One of the most elegant practical Bayesian deep learning approaches is the Bayes-by-Backprop algorithm, first introduced in the paper titled Weight Uncertainty in Neural Network. The main idea is to replace weights with weight distributions and learn the weight distribution parameters instead of the network parameters directly. The approach was extended from fully connected networks to both RNNs and CNNs.

Relevant papers:

Text Classification

Text classification architectures can be expressed as a simple four-step procedure: embed, encode, attend, predict. The classifiers implemented in this repository omit the attend step, use GloVe embeddings, softmax layer for predictions, and use either an LSTM (Long short-term memory) or TCN (Temporal convolutional network) encoder.

Relevant papers:

Usage

Install the conda environment from environment.yml. Run train.py for training and test.py for testing, with appropriate cmd arguments (see corresponding .py files for details).

Results Summary

The goal was to compare Bayes-by-Backprop text classifiers with either Normal or Laplace weight priors to plain deep learning text classifiers with or without Dropout. The tables below contain the test set accuracies on the binary version of the Stanford Sentiment Treebank dataset (SST-2), the IMDb dataset, and the fine-grained version of the Yelp 2015 dataset (Yelp-f). Bayes-by-Backprop text classifiers achieve performance comparable to non-Bayesian Dropout variants, while doubling the number of parameters.

TCN classifier accuracies:

Dataset Plain Dropout BBB+Normal BBB+Laplace
SST-2 .83 .82 .81 .81
IMDb .89 .89 .88 .88
Yelp-f .62 .62 .62 .62

LSTM classifier accuracies:

Dataset Plain Dropout BBB+Normal BBB+Laplace
SST-2 .81 .81 .82 .82
IMDb .83 .83 .81 .81
Yelp-f .63 .62 .63 .63

If you are interested in state-of-the-art performance on the used datasets, check out NLP-Progress.

Selective Classification

Another set of experiments, involving selective classification (or classification with reject option), yielded the same outcome as did the experiments in Selective Classification for Deep Neural Networks - baseline softmax activation value vastly outperforms Bayesian deep model uncertainty as a proxy for neural network prediction confidence.

Credits

  • MXNet Bayes-by-backprop tutorial link
  • source code for Bayesian Recurrent Neural Networks link
  • source code for Weight Uncertainty in Neural Networks link
  • source code for PyTorch layers link
  • source code for selective deep learning link

Project status

No longer actively developed!

  • Note: My goal was to code the Bayesian layers from scratch. For up-to-date Bayesian deep learning layer implementations, check out the awesome TensorFlow Probability.

License

MIT © Antonio Šajatović