Skip to content

semantic-textual-relatedness/Semantic_Relatedness_SemEval2024

Repository files navigation

SemEval 2024 Task 1: Semantic Textual Relatedness for African and Asian Languages

This repository contains the data and resources for the SemEval 2024 Task 1: Semantic Textual Relatedness (STR). For more information, please visit the shared task and competition websites.

Dataset | Languages | Shared Task Starter Kit | Citing This Work

If you use our data, please cite our papers:

@inproceedings{ousidhoum2024semrel2024,
title={SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 13 Languages}, 
author={Ousidhoum, Nedjma and Muhammad, Shamsuddeen Hassan and Abdalla, Mohamed and Abdulmumin, Idris and Ahmad, Ibrahim Said and
Ahuja, Sanchit and Aji, Alham Fikri and Araujo, Vladimir and Ayele, Abinew Ali and Baswani, Pavan and Beloucif, Meriem and Biemann, Chris and Bourhim, Sofia and De Kock, Christine and Dekebo, Genet Shanko and
Oumaima Hourrane and Gopichand Kanumolu and Lokesh Madasu and Samuel Rutunda and Manish Shrivastava and Solorio, Thamar and Surange, Nirmal and Tilaye, Hailegnaw Getaneh and Vishnubhotla, Krishnapriya and Winata, Genta and Yimam, Seid Muhie and Mohammad, Saif M.},
booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
year = "2024",
publisher = "Association for Computational Linguistics",
}

@inproceedings{ousidhoum-etal-2024-semeval, 
title = "{S}em{E}val-2024 Task 1: Semantic Textual Relatedness for African and Asian Languages",
author = "Ousidhoum, Nedjma and Muhammad, Shamsuddeen Hassan and Abdalla, Mohamed and Abdulmumin, Idris and
Ahmad,Ibrahim Said and Ahuja, Sanchit and Aji, Alham Fikri and Araujo, Vladimir and     Beloucif, Meriem and
De Kock, Christine and Hourrane, Oumaima and Shrivastava, Manish and Solorio, Thamar and Surange, Nirmal and
Vishnubhotla, Krishnapriya and Yimam, Seid Muhie and Mohammad, Saif M.",
booktitle = "Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)",
year = "2024",
publisher = "Association for Computational Linguistics"
}

The annotation guidelines are available here. (The PDF named SemRel Annotation guidelines.)

Check the SemRel Baseline Folder for details about the baseline experiment.

Dataset

The STR dataset is available in the data folder or can be downloaded from Hugging Face.

**Note that the full BWS tuple annotations will be available soon. **

Languages

The STR task focuses on the following 14 languages:

  1. Afrikaans (afr released)
  2. Algerian Arabic (arq released)
  3. Amharic (amh released)
  4. English (eng released)
  5. Hausa (hau released)
  6. Indonesian (ind released)
  7. Hindi (hin released)
  8. Kinyarwanda (kin released)
  9. Marathi (mar released)
  10. Modern Standard Arabic (arb released)
  11. Moroccan Arabic (ary released)
  12. Punjabi (pan released)
  13. Spanish (esp released)
  14. Telugu (tel released)

Shared Task Starter Kit

A starter kit is available to help you create a baseline result. You can open the starter kit in a Colab Notebook and run the baseline system. The resultant experiment can be submitted to Codalab to ensure the submission format is clear.

To run the Colab Notebook, click the badge "Open in Colab".

  • Simple Co-occurrence Baseline for Semantic Relatedness: Open In Colab

SemRel Related Papers.