Saracsm-detection-for-extremly-unbalanced-dataset

This repo contains work carried out for SemEval 2022 Task 6: iSarcasmEval: Intended Sarcasm Detection In English and Arabic.

This github is an implementation for accepted manuscript titled reamtchka at SemEval-2022 Task 6: Investigating the effect of different loss functions for Sarcasm detection for unbalanced datasets.

Overview

This repo contains the system implementation used in SemEval-2022 Task 6: Intended Sarcasm Detection in English and Arabic. Achieving 20th, 3rd places for task A, 16th place for task B, and 10, 6th places for task C on the leaderboard. We proposed a voting classifier between different Bert-Based KimCNN models which are trained on modified loss functions in order to improve model performance for an extremely unbalanced dataset and handcrafted features with machine learning models as SVM. The main contributions of our system are 1) Identifying appropriate loss functions to help train Bert-Base models and Deep learning models in presence of extremely unbalanced datasets, 2) Investigating the importance of different layers in Bert-Base models.

Publised paper.

Presentation slides, Poster and Video.

Dataset.

If you find code/work useful, please consider citing

@inproceedings{abdel-salam-2022-reamtchka,
    title = "reamtchka at {S}em{E}val-2022 Task 6: Investigating the effect of different loss functions for Sarcasm detection for unbalanced datasets",
    author = "Abdel-Salam, Reem",
    booktitle = "Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.semeval-1.126",
    pages = "896--906",
    abstract = "This paper describes the system used in SemEval-2022 Task 6: Intended Sarcasm Detection in English and Arabic. Achieving 20th,3rd places with 34{\&} 47 F1-Sarcastic score for task A, 16th place for task B with 0.0560 F1-macro score, and 10, 6th places for task C with72{\%} and 80{\%} accuracy on the leaderboard. A voting classifier between either multiple different BERT-based models or machine learningmodels is proposed, as our final model. Multiple key points has been extensively examined to overcome the problem of the unbalance ofthe dataset as: type of models, suitable architecture, augmentation, loss function, etc. In addition to that, we present an analysis of ourresults in this work, highlighting its strengths and shortcomings.",
}

Results

Different model results for Task A En & Ar on official test-set
Different model results for Task B En on official test-set
Different model results for Task C En & Ar on official test-set

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Results		Results
Task A		Task A
Task B		Task B
Task C		Task C
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results

Results

Task A

Task A

Task B

Task B

Task C

Task C

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Saracsm-detection-for-extremly-unbalanced-dataset

Overview

Results

About

Releases

Packages

Languages

License

rematchka/Intended-Sarcasm-Detection-In-English-and-Arabic-for-extremly-unbalanced-datasets

Folders and files

Latest commit

History

Repository files navigation

Saracsm-detection-for-extremly-unbalanced-dataset

Overview

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Languages