Skip to content

sismetanin/emosense-semeval2019-task3-emocontext

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

EmoSense at SemEval-2019 Task 3: Bidirectional LSTM Network for Contextual Emotion Detection in Textual Conversations

Overview

This repository contains the source code of the models used for EmoSense submissions for SemEval-2019 Task 3 “EmoContext: Contextual Emotion Detection in Text”. The model is described in the paper "EmoSense at SemEval-2019 Task 3: Bidirectional LSTM Network for Contextual Emotion Detection in Textual Conversations".

The proposed approach achieved 72.59% micro-average F1 score for emotion classes at the test dataset, thereby significantly outperform the officially-released baseline, namely larger in 14%.

We designed a specific architecture of LSTM which allows not only to learn semantic and sentiment feature represen- tation, but also to capture user-specific conversation features. In this work, we didn’t use any tradi- tional NLP features such as sentiment lexicons or hand-crafted linguistic by substituting them with word embeddings which were calculated automatically from the text corpora with an advanced pre-processing stage.

Citation:

@inproceedings{smetanin-2019-emosense,
    title = "{E}mo{S}ense at {S}em{E}val-2019 Task 3: Bidirectional {LSTM} Network for Contextual Emotion Detection in Textual Conversations",
    author = "Smetanin, Sergey",
    booktitle = "Proceedings of the 13th International Workshop on Semantic Evaluation",
    year = "2019",
    address = "Minneapolis, Minnesota, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/S19-2034",
    pages = "210--214",
}

MSA The architecture of a smaller version of the proposed model. LSTM unit for the first turn and for the third turn have shared weights.

Source Code of the Model

Pre-trained Word Embeddings

The emotion detection models were trained on top of pre-trained DataStories word embeddings, which were additionally fine-tuned on the automatically collected emotional dataset.

Texts were pre-processed by Ekphrasis. This tool helps to perform spell correction, word normalization and segmentation and allows to specify which tokens should be omitted, normalized or annotated with special tags.

Pre-trained 300 dimensional embeddings may be downloaded at the following link: emosense.300d.txt. Place the embeddings file in root directory for the program to find it.

Documentation and How to report bugs