PerAnSel: A Novel Deep Neural Network-based System for Persian Question Answering

PerAnSel: A Novel Deep Neural Network-based System for Persian Question Answering

About 110 million people from Iran, Tajikistan, Afghanistan, and six other countries speak Persian. The Persian language is: (1) free word order, (2) right-to-left, (3) morphologically-rich, and (4) low-resource. In order to address the need for a high-quality answer selection dataset for the Persian language, we present PASD; the first large-scale native answer selection dataset for the Persian language. PASD contains approximately 100,000 question-answer pairs on Persian Wikipedia articles and is the first large-scale native answer selection dataset for the Persian language which is created by native annotators. We also translate WikiQA dataset to Persian. To show the quality of PASD, we employed it to train state of the art answer selection systems. Finally, we present PerAnSel: A Novel Deep Neural Network-based System for Persian Question Answering.

Dataset

Download

The PASD and WikiFA datasets are available for download from the PASD and WikiFA, respectively. The statistics of the PASD and WikiFA are shown below:

Split	Train	Dev	Test
PASD	17567	1000	1000
WikiFA	2118	396	633

In the following, question type distribution over PASD dataset is illustrated:

Question Word	Distribution
What	28.57%
How	15.54%
When	11.00%
Where	13.21%
Who	16.13%
Which	14.61%
Why	00.94%

Evalution

We implement two baseline systems: (1) ASBERT and (2) CETE. We also implement PerAnSel method for persian answer selection whose kernel are MBERT, Distilmbert, ALBERT-FA, ParsBERT. We evaluate each of the answer selection systems according to MRR evaluation metric.

Method	LM	PASD	WikiFA
ASBERT	MBERT	81.45%	51.32%
CETE	MBERT	79.99%	42.74%
PerAnSel	ParsBERT	74.30%	50.38%
PerAnSel	AlbertFA	77.21%	47.59%
PerAnSel	DistilmBert	81.55%	62.66%
PerAnSel	MBERT	89.36%	66.08%

We also presented a question classifier which use PASD as the training set and classifies the questions. Here, we evaluate the question classifier both intrinsically and extrinsically.

Intrinsically

Model	PASD
ParsBERT	88.20%
AlbertFA	90.70%
DistilmBert	95.30%
MBERT	97.90%

Extrinsically

Method	LM	PASD	WikiFA
PerAnSel	MBERT	92.11%	62.77%

Citation

Plain

Jamshid Mozafari, Arefeh Kazemi, Parham Moradi, Mohammad Ali Nematbakhsh, "PerAnSel: A Novel Deep Neural Network-Based System for Persian Question Answering", Computational Intelligence and Neuroscience, vol. 2022, Article ID 3661286, 21 pages, 2022. https://doi.org/10.1155/2022/3661286

Bibtex

@Article{Mozafari2022,
    author={Mozafari, Jamshid and Kazemi, Arefeh and Moradi, Parham and Nematbakhsh, Mohammad Ali},
    title={PerAnSel: A Novel Deep Neural Network-Based System for Persian Question Answering},
    journal={Computational Intelligence and Neuroscience},
    year={2022},
    month={Jul},
    day={18},
    publisher={Hindawi},
    volume={2022},
    pages={3661286},
    issn={1687-5265},
    doi={https://doi.org/10.1155/2022/3661286}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
PASD		PASD
WikiFA		WikiFA
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PASD

PASD

WikiFA

WikiFA

README.md

README.md

Repository files navigation

PerAnSel: A Novel Deep Neural Network-based System for Persian Question Answering

Dataset

Download

Evalution

Intrinsically

Extrinsically

Citation

Plain

Bibtex

About

Releases

Packages

Contributors 2

BigData-IsfahanUni/PerAnSel

Folders and files

Latest commit

History

Repository files navigation

PerAnSel: A Novel Deep Neural Network-based System for Persian Question Answering

Dataset

Download

Evalution

Intrinsically

Extrinsically

Citation

Plain

Bibtex

About

Resources

Stars

Watchers

Forks