Skip to content

davitjnz/electra-ka

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 

Repository files navigation

electra-ka

Introduction

electra-ka is an open-source model for the Georgian language.

The model is available on huggingface hub

The model is trained on 33GB of Georgian text collected from 4854621 pages in the commoncrawl archive.

The fine-tuned model is also available on the hub.

from transformers import ElectraTokenizerFast
model = ElectraForSequenceClassification.from_pretrained("jnz/electra-ka-discrediting")
tokenizer = ElectraTokenizerFast.from_pretrained("jnz/electra-ka")

inputs = tokenizer("your text goes here...", return_tensors="pt")
predictions = model(**inputs)

Under the hood, the electra model uses the same architecture as BERT, but to avoid misuse can only serve as a discriminator, which makes it much harder to use for text generation.

BERT architecture language model for Georgian language.

To read more about electra please refer to the paper ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators.

In case of any questions/comments please feel free to reach out at djanezashvili[at]gmail.com