Skip to content

rafalposwiata/structured-sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

OPI at SemEval-2022 Task 10: Structured Sentiment Analysis

This is the source code of my solution for the SemEval-2022 Shared Task 10: Structured Sentiment Analysis which is described in the paper: OPI at SemEval-2022 Task 10: Transformer-based Sequence Tagging with Relation Classification for Structured Sentiment Analysis.

If you use the code from this repository, please cite:

@inproceedings{poswiata-2022-opi,
    title = "{OPI} at {S}em{E}val-2022 Task 10: Transformer-based Sequence Tagging with Relation Classification for Structured Sentiment Analysis",
    author = "Po{\'s}wiata, Rafa{\l}",
    booktitle = "Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.semeval-1.190",
    doi = "10.18653/v1/2022.semeval-1.190",
    pages = "1366--1372",
}

Problem description

Structured Sentiment Analysis (SSA) can be formulated as an information extraction task in which one attempts to find all of the opinion tuples O = Oi,...,On in a text. Each opinion Oi is a tuple (h, t, e, p) where h is a holder who expresses a polarity p towards a target t through a sentiment expression e, implicitly defining pairwise relationships between elements of the same tuple (Barnes et al., 2021). An example of such tuples as a structure sentiment graph is shown in Figure 1.

SSA_example

Figure 1: SSA example as a structure sentiment graph.

Subtasks

Monolingual

In the monolingual sub-task, the systems were trained and then tested on the datasets in the same languages. Seven structured sentiment datasets in five languages selected for the competition are shown in Table 1.

Dataset Language Type of data # sents # holders # targets # expr.
MPQA English News 10048 2279 2452 2814
DSunis English Reviews of online universities
and e-commerce
2803 86 1119 1119
OpeNERen English Hotel reviews 2494 413 3850 4150
OpeNERes Spanish Hotel reviews 2057 255 3980 4388
MultiBookedca Catalan Hotel reviews 1678 235 2336 2756
MultiBookedeu Basque Hotel reviews 1521 296 1775 2328
NoReCfine Norwegian Multi-domain reviews 11437 1128 8923 11115

Table 1: Datasets used during competition.

Cross-lingual

In the cross-lingual sub-task, systems had to be prepared for Catalan, Basque and Spanish datasets, while data in these languages could not be used for training. This setup is often known as zero-shot cross-lingual transfer (Hu et al., 2020).

Solution

The architecture of our solution is shown in Figure 2. It consists of two main components: Extraction Module and Relation Classification Module. The first module is based on sequence tagging and is used to extract targets, holders and expressions with polarity. Having already extracted entities, the role of the second module is to classify whether there is a relationship between them.

solution

Figure 2: Architecture of the proposed solution.

Results

Evaluation of the systems was based on the official competition metric i.e. Sentiment Graph F1 (SF1). My system achieved average SF1 scores of 54.5% and 53.1% for the monolingual and cross-lingual sub-tasks, respectively. This allowed me to place 11th and 9th out of 32 teams in these subtasks.

Dataset Monolingual Cross-lingual
MPQA 32.6 -
DSunis 39.5 -
OpeNERen 67.0 -
OpeNERes 66.3 56.4
MultiBookedca 65.0 58.6
MultiBookedeu 65.3 44.4
NoReCfine 45.9 -
Average score 54.5 53.1

Table 2: Official competition results of the proposed solution.

Releases

No releases published

Packages

No packages published