GitHub - ltgoslo/assessing_and_probing_sentiment: This repo contains a challenging dataset for sentiment analysis, as well as a python script to calculate per class results presented in at BlackboxNLP 2019

Outstanding challenges in sentiment analysis

This repo contains a challenging dataset for sentiment analysis, as well as a python script to calculate per class results presented in at BlackboxNLP 2019.

Jeremy Barnes, Lilja Øvrelid, and Erik Velldal. 2019. Sentiment analysis is not solved!: Assessing and probing sentiment classification. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. To appear.

If you use the code for academic research, please cite the paper in question:

@inproceedings{barnes-etal-2019-sentiment,
    title = "Sentiment Analysis Is Not Solved! Assessing and Probing Sentiment Classification",
    author = "Barnes, Jeremy  and
      {\O}vrelid, Lilja  and
      Velldal, Erik",
    booktitle = "Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP",
    month = aug,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/W19-4802",
    pages = "12--23"
}

It was created by finding the subset of instances from six sentence-level sentiment datasets (MPQA polarity, OpeNER, SemEval 2013 Task 2, Stanford Sentiment Treebank, Täckström Dataset, Thelwall Datasets) that an oracle ensemble of models (Bag-of-Words + L2 regularized Logistic Regression, BiLSTM, ELMo, BERT) incorrectly predicted.

We performed a thorough error analysis of the data by annotating each sentence for 19 linguistic and paralinguistic categories. Sentences may contain more than a single category.

Dataset

The dataset contains 836 sentences in a tab separated format:

sentence index    dataset it comes from    index within that dataset    gold label    text    error annotations

as in the following example

247    sst    202    2    It wo n't bust your gut -- and it 's not intended to -- it 's merely a blandly cinematic surgical examination of what makes a joke a joke .    positive::idioms::negated::sarcasm/irony

The gold labels range from 0 (Strong Negative) to 5 (Strong Positive).

Use

First, use your favorite models to get predictions for the test.txt data. Each prediction file should contain an integer prediction (0-5) for each sentence in test.txt (one per line). See example_pred_file.txt to ensure your file is similar.

python3 analyze_predictions.py [prediction_files]

The script will print out the accuracy for each of the categories.

Example

python3 analyze_predictions.py example_pred.txt

Challenge dataset file: annotated.txt
Testing predictions from example_pred.txt/
model               pos    neg    mixed    no-sent    spelling    desirable    idioms    strong    negated    w-know    amp.    comp.    irony    shift    emoji    modal    morph.    red.    vocab
----------------  -----  -----  -------  ---------  ----------  -----------  --------  --------  ---------  --------  ------  -------  -------  -------  -------  -------  --------  ------  -------
example_pred.txt   16.0   55.4     14.6        1.0        53.1         44.9      18.8      18.6       30.7      32.4    33.3     33.3     45.8     62.2     72.2     45.7       7.4    15.4     12.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

analyze_predictions.py

analyze_predictions.py

annotated.txt

annotated.txt

example_pred.txt

example_pred.txt

test.txt

test.txt

Repository files navigation

Outstanding challenges in sentiment analysis

Dataset

Use

Example

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
analyze_predictions.py		analyze_predictions.py
annotated.txt		annotated.txt
example_pred.txt		example_pred.txt
test.txt		test.txt

ltgoslo/assessing_and_probing_sentiment

Folders and files

Latest commit

History

Repository files navigation

Outstanding challenges in sentiment analysis

Dataset

Use

Example

About

Topics

Resources

Stars

Watchers

Forks

Languages