Skip to content

Latest commit

 

History

History
69 lines (54 loc) · 8.43 KB

models.md

File metadata and controls

69 lines (54 loc) · 8.43 KB

Models

Named Entity Recognition

  • ParsBERT-NER - It is a fine-tuned model based on ParsBERT (a monolingual Persian language model) on a vast range of dataset PEYMA, ARMAN, and PEYMA+ARMAN.
  • ALBERT-NER - It is a fine-tuned on PEYMA and ARMAN dataset based on ALBERT Language Model.

Text Classification

Sentiment Analysis

Summarization

  • BERT2BERT - BERT2BERT is the first pre-trained summarization model trained on Wiki Summary based on ParsBERT.

Question Answering

Multiple-Choice QA

Reading Comprehension

Translation

Textual Entailment

Query Paraphrasing

Embeddings

  • Farsi Poem word2vec model - This is a word2vec model deveoped based on a corpus of 48 Persian poets. The corpus consists of 1,216,286 mesras of Farsi poems and 8,102,119 words from which 148,588 are unique.
  • Sentence Transformers - ST is a collection of vector representations for sentences and paragraphs (also known as sentence embeddings). ST models are based on transformer networks like ParsBERT, ALBERT (soon). They are tuned based on Textual Thematic Similarity datasets such that sentences with similar meanings are close in vector space.

Language Model

Grapheme to Phoneme

  • g2p_fa - A Persian Grapheme to Phoneme model using LSTM implemented in pytorch.
  • Persian_g2p - A seq-to-seq model for Persian (Farsi) Grapheme To Phoneme mapping.
  • G2P - Attention Based Grapheme To Phoneme