Skip to content

Latest commit

 

History

History
134 lines (122 loc) · 49.2 KB

pretrained-vectors.md

File metadata and controls

134 lines (122 loc) · 49.2 KB

Pre-trained word vectors

We are publishing pre-trained word vectors for 294 languages, trained on Wikipedia using fastText. These vectors in dimension 300 were obtained using the skip-gram model described in Bojanowski et al. (2016) with default parameters.

Format

The word vectors come in both the binary and text default formats of fastText. In the text format, each line contain a word followed by its embedding. Each value is space separated. Words are ordered by their frequency in a descending order.

License

The pre-trained word vectors are distributed under the Creative Commons Attribution-Share-Alike License 3.0.

References

If you use these word embeddings, please cite the following paper:

P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information

@article{bojanowski2016enriching,
  title={Enriching Word Vectors with Subword Information},
  author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
  journal={arXiv preprint arXiv:1607.04606},
  year={2016}
}

Models

The models can be downloaded from:

Abkhazian: bin+text, text Acehnese: bin+text, text Adyghe: bin+text, text
Afar: bin+text, text Afrikaans: bin+text, text Akan: bin+text, text
Albanian: bin+text, text Alemannic: bin+text, text Amharic: bin+text, text
Anglo_Saxon: bin+text, text Arabic: bin+text, text Aragonese: bin+text, text
Aramaic: bin+text, text Armenian: bin+text, text Aromanian: bin+text, text
Assamese: bin+text, text Asturian: bin+text, text Avar: bin+text, text
Aymara: bin+text, text Azerbaijani: bin+text, text Bambara: bin+text, text
Banjar: bin+text, text Banyumasan: bin+text, text Bashkir: bin+text, text
Basque: bin+text, text Bavarian: bin+text, text Belarusian: bin+text, text
Bengali: bin+text, text Bihari: bin+text, text Bishnupriya Manipuri: bin+text, text
Bislama: bin+text, text Bosnian: bin+text, text Breton: bin+text, text
Buginese: bin+text, text Bulgarian: bin+text, text Burmese: bin+text, text
Buryat: bin+text, text Cantonese: bin+text, text Catalan: bin+text, text
Cebuano: bin+text, text Central Bicolano: bin+text, text Chamorro: bin+text, text
Chavacano: bin+text, text Chechen: bin+text, text Cherokee: bin+text, text
Cheyenne: bin+text, text Chichewa: bin+text, text Chinese: bin+text, text
Choctaw: bin+text, text Chuvash: bin+text, text Classical Chinese: bin+text, text
Cornish: bin+text, text Corsican: bin+text, text Cree: bin+text, text
Crimean Tatar: bin+text, text Croatian: bin+text, text Czech: bin+text, text
Danish: bin+text, text Divehi: bin+text, text Dutch: bin+text, text
Dutch Low Saxon: bin+text, text Dzongkha: bin+text, text Eastern Punjabi: bin+text, text
Egyptian Arabic: bin+text, text Emilian_Romagnol: bin+text, text English: bin+text, text
Erzya: bin+text, text Esperanto: bin+text, text Estonian: bin+text, text
Ewe: bin+text, text Extremaduran: bin+text, text Faroese: bin+text, text
Fiji Hindi: bin+text, text Fijian: bin+text, text Finnish: bin+text, text
Franco_Provençal: bin+text, text French: bin+text, text Friulian: bin+text, text
Fula: bin+text, text Gagauz: bin+text, text Galician: bin+text, text
Gan: bin+text, text Georgian: bin+text, text German: bin+text, text
Gilaki: bin+text, text Goan Konkani: bin+text, text Gothic: bin+text, text
Greek: bin+text, text Greenlandic: bin+text, text Guarani: bin+text, text
Gujarati: bin+text, text Haitian: bin+text, text Hakka: bin+text, text
Hausa: bin+text, text Hawaiian: bin+text, text Hebrew: bin+text, text
Herero: bin+text, text Hill Mari: bin+text, text Hindi: bin+text, text
Hiri Motu: bin+text, text Hungarian: bin+text, text Icelandic: bin+text, text
Ido: bin+text, text Igbo: bin+text, text Ilokano: bin+text, text
Indonesian: bin+text, text Interlingua: bin+text, text Interlingue: bin+text, text
Inuktitut: bin+text, text Inupiak: bin+text, text Irish: bin+text, text
Italian: bin+text, text Jamaican Patois: bin+text, text Japanese: bin+text, text
Javanese: bin+text, text Kabardian: bin+text, text Kabyle: bin+text, text
Kalmyk: bin+text, text Kannada: bin+text, text Kanuri: bin+text, text
Kapampangan: bin+text, text Karachay_Balkar: bin+text, text Karakalpak: bin+text, text
Kashmiri: bin+text, text Kashubian: bin+text, text Kazakh: bin+text, text
Khmer: bin+text, text Kikuyu: bin+text, text Kinyarwanda: bin+text, text
Kirghiz: bin+text, text Kirundi: bin+text, text Komi: bin+text, text
Komi_Permyak: bin+text, text Kongo: bin+text, text Korean: bin+text, text
Kuanyama: bin+text, text Kurdish (Kurmanji): bin+text, text Kurdish (Sorani): bin+text, text
Ladino: bin+text, text Lak: bin+text, text Lao: bin+text, text
Latgalian: bin+text, text Latin: bin+text, text Latvian: bin+text, text
Lezgian: bin+text, text Ligurian: bin+text, text Limburgish: bin+text, text
Lingala: bin+text, text Lithuanian: bin+text, text Livvi_Karelian: bin+text, text
Lojban: bin+text, text Lombard: bin+text, text Low Saxon: bin+text, text
Lower Sorbian: bin+text, text Luganda: bin+text, text Luxembourgish: bin+text, text
Macedonian: bin+text, text Maithili: bin+text, text Malagasy: bin+text, text
Malay: bin+text, text Malayalam: bin+text, text Maltese: bin+text, text
Manx: bin+text, text Maori: bin+text, text Marathi: bin+text, text
Marshallese: bin+text, text Mazandarani: bin+text, text Meadow Mari: bin+text, text
Min Dong: bin+text, text Min Nan: bin+text, text Minangkabau: bin+text, text
Mingrelian: bin+text, text Mirandese: bin+text, text Moksha: bin+text, text
Moldovan: bin+text, text Mongolian: bin+text, text Muscogee: bin+text, text
Nahuatl: bin+text, text Nauruan: bin+text, text Navajo: bin+text, text
Ndonga: bin+text, text Neapolitan: bin+text, text Nepali: bin+text, text
Newar: bin+text, text Norfolk: bin+text, text Norman: bin+text, text
North Frisian: bin+text, text Northern Luri: bin+text, text Northern Sami: bin+text, text
Northern Sotho: bin+text, text Norwegian (Bokmål): bin+text, text Norwegian (Nynorsk): bin+text, text
Novial: bin+text, text Nuosu: bin+text, text Occitan: bin+text, text
Old Church Slavonic: bin+text, text Oriya: bin+text, text Oromo: bin+text, text
Ossetian: bin+text, text Palatinate German: bin+text, text Pali: bin+text, text
Pangasinan: bin+text, text Papiamentu: bin+text, text Pashto: bin+text, text
Pennsylvania German: bin+text, text Persian: bin+text, text Picard: bin+text, text
Piedmontese: bin+text, text Polish: bin+text, text Pontic: bin+text, text
Portuguese: bin+text, text Quechua: bin+text, text Ripuarian: bin+text, text
Romani: bin+text, text Romanian: bin+text, text Romansh: bin+text, text
Russian: bin+text, text Rusyn: bin+text, text Sakha: bin+text, text
Samoan: bin+text, text Samogitian: bin+text, text Sango: bin+text, text
Sanskrit: bin+text, text Sardinian: bin+text, text Saterland Frisian: bin+text, text
Scots: bin+text, text Scottish Gaelic: bin+text, text Serbian: bin+text, text
Serbo_Croatian: bin+text, text Sesotho: bin+text, text Shona: bin+text, text
Sicilian: bin+text, text Silesian: bin+text, text Simple English: bin+text, text
Sindhi: bin+text, text Sinhalese: bin+text, text Slovak: bin+text, text
Slovenian: bin+text, text Somali: bin+text, text Southern Azerbaijani: bin+text, text
Spanish: bin+text, text Sranan: bin+text, text Sundanese: bin+text, text
Swahili: bin+text, text Swati: bin+text, text Swedish: bin+text, text
Tagalog: bin+text, text Tahitian: bin+text, text Tajik: bin+text, text
Tamil: bin+text, text Tarantino: bin+text, text Tatar: bin+text, text
Telugu: bin+text, text Tetum: bin+text, text Thai: bin+text, text
Tibetan: bin+text, text Tigrinya: bin+text, text Tok Pisin: bin+text, text
Tongan: bin+text, text Tsonga: bin+text, text Tswana: bin+text, text
Tulu: bin+text, text Tumbuka: bin+text, text Turkish: bin+text, text
Turkmen: bin+text, text Tuvan: bin+text, text Twi: bin+text, text
Udmurt: bin+text, text Ukrainian: bin+text, text Upper Sorbian: bin+text, text
Urdu: bin+text, text Uyghur: bin+text, text Uzbek: bin+text, text
Venda: bin+text, text Venetian: bin+text, text Vepsian: bin+text, text
Vietnamese: bin+text, text Volapük: bin+text, text Võro: bin+text, text
Walloon: bin+text, text Waray: bin+text, text Welsh: bin+text, text
West Flemish: bin+text, text West Frisian: bin+text, text Western Punjabi: bin+text, text
Wolof: bin+text, text Wu: bin+text, text Xhosa: bin+text, text
Yiddish: bin+text, text Yoruba: bin+text, text Zazaki: bin+text, text
Zeelandic: bin+text, text Zhuang: bin+text, text Zulu: bin+text, text