Skip to content

A Farsi (Persian) Semantic Similarity Measurement Dataset (FarSSiM)

Notifications You must be signed in to change notification settings

mojtabasajjadi/FarSSiM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Contact Info.


mojtabasadjadi@gmail.com
https://www.linkedin.com/in/smsajjadi/
https://github.com/mojtabasajjadi/FarSSiM

Introduction


FarSSiM is the first STS dataset for the informal Persian language. It consists of about 1123 informal Farsi short text pairs. Each text pair is annotated for relatedness and semantics in meaning and for the entailment relation between the two elements. This dataset is collected by identifying paraphrases between Persian tweets.

File Structure: xlsx file

Fields


tweet 1: the first text
tweet 2: the second text
1st: the first annotator's score
2st: the second annotator's score
3st: the third annotator's score
4st: the fourth annotator's score
average: the mean of 4 annotators' score
standard deviation: the standard deviation of 4 annotators' score
variance: the variance of 4 annotators' score

Statistic


Total pairs: 1123

Releases

No releases published

Packages

No packages published