This is a PyTorch implementation of A structured self-attentive sentence embedding by Lin et al 2017. This approach has been applied to Author profiling PAN 2015 and 2016 tasks. The data can be obtained from the above links. This implementation handles gender and age group classification.
The approach uses 100-dimensional Glove word embeddings to initialize the word embedding layer.
The program can be executed by
python main.py --input ./data --expt self-attn-gender --attr gender
Parameters:
--input - Input path with
--results - Directory to store models and results
--expt - Experiment name
--wordemb - Word embeddings (100-dim Glove embeddings)
--batchsz - Batch size
--nepoch - Number of epochs
--embedsz - Word embedding size
--hiddensz - Hidden layer size
--nlayers - Number of hidden layers
--attnsz - Number of attention units (d_a)
--attnhops - Number of attention hops (r)
--fcsize - Fully connected layer size
--attr - Attribute to profile (gender or age group)
--lr - Learning rate
Features that were found salient by the attention layer for different social groups
Reference:
- Lin, Z., Feng, M., Santos, C. N. D., Yu, M., Xiang, B., Zhou, B., & Bengio, Y. (2017). A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130.