Skip to content

EugeneSel/EUMT

Repository files navigation

EUMT

English-Ukrainian bidirectional neural machine translator, based on fastText word embeddings (sisg- model [1]) and default Transformer architecture [2] of the OpenNMT framework.

The following OPUS datasets [3] were used for training:

Launch translator:

Binder

Check out my article, related to this project.

References

  1. Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135-146.
  2. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
  3. Jörg Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC'2012).
  4. Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman, WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia, arXiv, July 11 2019.
  5. Ahmed El-Kishky, Adi Renduchintala, James Cross, Francisco Guzmán and Philipp Koehn, XLEnt: Mining Cross-lingual Entities with Lexical-Semantic-Phonetic Word Alignment, Online preprint, 2021.
  6. A. Abdelali, F. Guzman, H. Sajjad and S. Vogel, "The AMARA Corpus: Building parallel language resources for the educational domain", The Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC'14). Reykjavik, Iceland, 2014. Pp. 1856-1862. Isbn. 978-2-9517408-8-4.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published