Skip to content

A large scale dataset for Question Answering in Ukrainian

Notifications You must be signed in to change notification settings

Drastic/squad-uk

Repository files navigation

SQuAD-uk

A large scale dataset for Question Answering in Ukrainian

SQuAD [Rajpurkar et al. 2016] is a large scale dataset for training of question answering systems on factoid questions. It contains more than 100,000 question-answer pairs about passages from 536 articles chosen from various domains of Wikipedia.

SQuAD-uk is derived from the SQuAD dataset and it is obtained through semi-automatic translation of the SQuAD dataset into Ukrainian. It represents a large-scale dataset for open question answering processes on factoid questions in Ukrainian. The dataset contains more than 30,000 question/answer pairs derived from the original English dataset. The dataset is training set to support the replicability of the benchmarking of QA systems:

  • squad-train-v1.1-uk-mini.json: it contains MINI training examples derived from the original SQuAD 1.1 trainig material.
  • squad-train-v1.1-uk.json: it contains training examples derived from the original SQuAD 1.1 trainig material.
  • squad-uk-1.1.zip: it contains training examples from squad-train-v1.1-uk.json and split into train-v1.1-uk.json and dev-v1.1-uk.json.

About

A large scale dataset for Question Answering in Ukrainian

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published