Skip to content

bilalghanem/multilingual_irony

Repository files navigation

Note: If you need the full corpus (with the tweets text), please fill this form: IDAT data

In this repository, you can find an Arabic Irony corpus which is mentioned in Irony Detection in a Multilingual Context ECIR-2020 paper. The corpus consists of ~5.5k tweets annotated by two native Arabic speakers with appended with another randomly 5.5k sampled tweets from the original un-annotated corpus (ECIR_training.csv & ECIR_test.csv).

This corpus has been used also in IDAT shared task at FIRE-2019: IDAT@FIRE2019: Overview of the Track on Irony Detection in Arabic Tweets, but without adding the random 5.5k sample to ensure the quality of the data (IDAT_training.csv & IDAT_test.csv).

We distribute only the Ids of the annotated tweets due to Twitter policy. Thus, we share a python script to read the text of these tweets read_tweets_text.py.

REQUIREMENTS:

  • tweepy
  • pandas
  • tqdm

USAGE:

python read_tweets_text.py file_name

example:

python read_tweets_text.py ECIR_test.csv

Citations:

  @inproceedings{ghanem2019irony,
    title={Irony Detection in a Multilingual Context},
    author={Ghanem, Bilal and Karoui, Jihen and Benamara, Farah and Rosso, Paolo and Moriceau, V{\'e}ronique},
    booktitle={European Conference on Information Retrieval},
    year={2020},
    organization={Springer}
  }


@inproceedings{ghanem2019idat,
  title={IDAT@FIRE2019: Overview of the Track on Irony Detection in Arabic Tweets},
  author={Ghanem, Bilal and Karoui, Jihen and Benamara, Farah and Moriceau, V{\'e}ronique and Rosso, Paolo},
  booktitle={Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings. In: CEUR-WS.org, Kolkata, India},
  volume = {2517},
  pages={380--390},
  year={2019}
}

About

Irony Detection in a Multilingual Context

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages