GitHub - bilalghanem/multilingual_irony: Irony Detection in a Multilingual Context

Note: If you need the full corpus (with the tweets text), please fill this form: IDAT data

In this repository, you can find an Arabic Irony corpus which is mentioned in Irony Detection in a Multilingual Context ECIR-2020 paper. The corpus consists of ~5.5k tweets annotated by two native Arabic speakers with appended with another randomly 5.5k sampled tweets from the original un-annotated corpus (ECIR_training.csv & ECIR_test.csv).

This corpus has been used also in IDAT shared task at FIRE-2019: IDAT@FIRE2019: Overview of the Track on Irony Detection in Arabic Tweets, but without adding the random 5.5k sample to ensure the quality of the data (IDAT_training.csv & IDAT_test.csv).

We distribute only the Ids of the annotated tweets due to Twitter policy. Thus, we share a python script to read the text of these tweets read_tweets_text.py.

REQUIREMENTS:

tweepy
pandas
tqdm

USAGE:

python read_tweets_text.py file_name

example:

python read_tweets_text.py ECIR_test.csv

Citations:

  @inproceedings{ghanem2019irony,
    title={Irony Detection in a Multilingual Context},
    author={Ghanem, Bilal and Karoui, Jihen and Benamara, Farah and Rosso, Paolo and Moriceau, V{\'e}ronique},
    booktitle={European Conference on Information Retrieval},
    year={2020},
    organization={Springer}
  }


@inproceedings{ghanem2019idat,
  title={IDAT@FIRE2019: Overview of the Track on Irony Detection in Arabic Tweets},
  author={Ghanem, Bilal and Karoui, Jihen and Benamara, Farah and Moriceau, V{\'e}ronique and Rosso, Paolo},
  booktitle={Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings. In: CEUR-WS.org, Kolkata, India},
  volume = {2517},
  pages={380--390},
  year={2019}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ECIR_test.csv

ECIR_test.csv

ECIR_training.csv

ECIR_training.csv

IDAT_test.csv

IDAT_test.csv

IDAT_training.csv

IDAT_training.csv

LICENSE

LICENSE

README.md

README.md

read_tweets_text.py

read_tweets_text.py

Repository files navigation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
ECIR_test.csv		ECIR_test.csv
ECIR_training.csv		ECIR_training.csv
IDAT_test.csv		IDAT_test.csv
IDAT_training.csv		IDAT_training.csv
LICENSE		LICENSE
README.md		README.md
read_tweets_text.py		read_tweets_text.py

License

bilalghanem/multilingual_irony

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages