Wiki word vectors
We are publishing pre-trained word vectors for 294 languages, trained on Wikipedia using fastText. These vectors in dimension 300 were obtained using the skip-gram model described in Bojanowski et al. (2016) with default parameters.
Please note that a newer version of multi-lingual word vectors are available at: Word vectors for 157 languages.
Models
The models can be downloaded from:
| Abkhazian: bin+text, text | Acehnese: bin+text, text | Adyghe: bin+text, text |
| Afar: bin+text, text | Afrikaans: bin+text, text | Akan: bin+text, text |
| Albanian: bin+text, text | Alemannic: bin+text, text | Amharic: bin+text, text |
| Anglo_Saxon: bin+text, text | Arabic: bin+text, text | Aragonese: bin+text, text |
| Aramaic: bin+text, text | Armenian: bin+text, text | Aromanian: bin+text, text |
| Assamese: bin+text, text | Asturian: bin+text, text | Avar: bin+text, text |
| Aymara: bin+text, text | Azerbaijani: bin+text, text | Bambara: bin+text, text |
| Banjar: bin+text, text | Banyumasan: bin+text, text | Bashkir: bin+text, text |
| Basque: bin+text, text | Bavarian: bin+text, text | Belarusian: bin+text, text |
| Bengali: bin+text, text | Bihari: bin+text, text | Bishnupriya Manipuri: bin+text, text |
| Bislama: bin+text, text | Bosnian: bin+text, text | Breton: bin+text, text |
| Buginese: bin+text, text | Bulgarian: bin+text, text | Burmese: bin+text, text |
| Buryat: bin+text, text | Cantonese: bin+text, text | Catalan: bin+text, text |
| Cebuano: bin+text, text | Central Bicolano: bin+text, text | Chamorro: bin+text, text |
| Chavacano: bin+text, text | Chechen: bin+text, text | Cherokee: bin+text, text |
| Cheyenne: bin+text, text | Chichewa: bin+text, text | Chinese: bin+text, text |
| Choctaw: bin+text, text | Chuvash: bin+text, text | Classical Chinese: bin+text, text |
| Cornish: bin+text, text | Corsican: bin+text, text | Cree: bin+text, text |
| Crimean Tatar: bin+text, text | Croatian: bin+text, text | Czech: bin+text, text |
| Danish: bin+text, text | Divehi: bin+text, text | Dutch: bin+text, text |
| Dutch Low Saxon: bin+text, text | Dzongkha: bin+text, text | Eastern Punjabi: bin+text, text |
| Egyptian Arabic: bin+text, text | Emilian_Romagnol: bin+text, text | English: bin+text, text |
| Erzya: bin+text, text | Esperanto: bin+text, text | Estonian: bin+text, text |
| Ewe: bin+text, text | Extremaduran: bin+text, text | Faroese: bin+text, text |
| Fiji Hindi: bin+text, text | Fijian: bin+text, text | Finnish: bin+text, text |
| Franco_Provençal: bin+text, text | French: bin+text, text | Friulian: bin+text, text |
| Fula: bin+text, text | Gagauz: bin+text, text | Galician: bin+text, text |
| Gan: bin+text, text | Georgian: bin+text, text | German: bin+text, text |
| Gilaki: bin+text, text | Goan Konkani: bin+text, text | Gothic: bin+text, text |
| Greek: bin+text, text | Greenlandic: bin+text, text | Guarani: bin+text, text |
| Gujarati: bin+text, text | Haitian: bin+text, text | Hakka: bin+text, text |
| Hausa: bin+text, text | Hawaiian: bin+text, text | Hebrew: bin+text, text |
| Herero: bin+text, text | Hill Mari: bin+text, text | Hindi: bin+text, text |
| Hiri Motu: bin+text, text | Hungarian: bin+text, text | Icelandic: bin+text, text |
| Ido: bin+text, text | Igbo: bin+text, text | Ilokano: bin+text, text |
| Indonesian: bin+text, text | Interlingua: bin+text, text | Interlingue: bin+text, text |
| Inuktitut: bin+text, text | Inupiak: bin+text, text | Irish: bin+text, text |
| Italian: bin+text, text | Jamaican Patois: bin+text, text | Japanese: bin+text, text |
| Javanese: bin+text, text | Kabardian: bin+text, text | Kabyle: bin+text, text |
| Kalmyk: bin+text, text | Kannada: bin+text, text | Kanuri: bin+text, text |
| Kapampangan: bin+text, text | Karachay_Balkar: bin+text, text | Karakalpak: bin+text, text |
| Kashmiri: bin+text, text | Kashubian: bin+text, text | Kazakh: bin+text, text |
| Khmer: bin+text, text | Kikuyu: bin+text, text | Kinyarwanda: bin+text, text |
| Kirghiz: bin+text, text | Kirundi: bin+text, text | Komi: bin+text, text |
| Komi_Permyak: bin+text, text | Kongo: bin+text, text | Korean: bin+text, text |
| Kuanyama: bin+text, text | Kurdish (Kurmanji): bin+text, text | Kurdish (Sorani): bin+text, text |
| Ladino: bin+text, text | Lak: bin+text, text | Lao: bin+text, text |
| Latgalian: bin+text, text | Latin: bin+text, text | Latvian: bin+text, text |
| Lezgian: bin+text, text | Ligurian: bin+text, text | Limburgish: bin+text, text |
| Lingala: bin+text, text | Lithuanian: bin+text, text | Livvi_Karelian: bin+text, text |
| Lojban: bin+text, text | Lombard: bin+text, text | Low Saxon: bin+text, text |
| Lower Sorbian: bin+text, text | Luganda: bin+text, text | Luxembourgish: bin+text, text |
| Macedonian: bin+text, text | Maithili: bin+text, text | Malagasy: bin+text, text |
| Malay: bin+text, text | Malayalam: bin+text, text | Maltese: bin+text, text |
| Manx: bin+text, text | Maori: bin+text, text | Marathi: bin+text, text |
| Marshallese: bin+text, text | Mazandarani: bin+text, text | Meadow Mari: bin+text, text |
| Min Dong: bin+text, text | Min Nan: bin+text, text | Minangkabau: bin+text, text |
| Mingrelian: bin+text, text | Mirandese: bin+text, text | Moksha: bin+text, text |
| Moldovan: bin+text, text | Mongolian: bin+text, text | Muscogee: bin+text, text |
| Nahuatl: bin+text, text | Nauruan: bin+text, text | Navajo: bin+text, text |
| Ndonga: bin+text, text | Neapolitan: bin+text, text | Nepali: bin+text, text |
| Newar: bin+text, text | Norfolk: bin+text, text | Norman: bin+text, text |
| North Frisian: bin+text, text | Northern Luri: bin+text, text | Northern Sami: bin+text, text |
| Northern Sotho: bin+text, text | Norwegian (Bokmål): bin+text, text | Norwegian (Nynorsk): bin+text, text |
| Novial: bin+text, text | Nuosu: bin+text, text | Occitan: bin+text, text |
| Old Church Slavonic: bin+text, text | Oriya: bin+text, text | Oromo: bin+text, text |
| Ossetian: bin+text, text | Palatinate German: bin+text, text | Pali: bin+text, text |
| Pangasinan: bin+text, text | Papiamentu: bin+text, text | Pashto: bin+text, text |
| Pennsylvania German: bin+text, text | Persian: bin+text, text | Picard: bin+text, text |
| Piedmontese: bin+text, text | Polish: bin+text, text | Pontic: bin+text, text |
| Portuguese: bin+text, text | Quechua: bin+text, text | Ripuarian: bin+text, text |
| Romani: bin+text, text | Romanian: bin+text, text | Romansh: bin+text, text |
| Russian: bin+text, text | Rusyn: bin+text, text | Sakha: bin+text, text |
| Samoan: bin+text, text | Samogitian: bin+text, text | Sango: bin+text, text |
| Sanskrit: bin+text, text | Sardinian: bin+text, text | Saterland Frisian: bin+text, text |
| Scots: bin+text, text | Scottish Gaelic: bin+text, text | Serbian: bin+text, text |
| Serbo_Croatian: bin+text, text | Sesotho: bin+text, text | Shona: bin+text, text |
| Sicilian: bin+text, text | Silesian: bin+text, text | Simple English: bin+text, text |
| Sindhi: bin+text, text | Sinhalese: bin+text, text | Slovak: bin+text, text |
| Slovenian: bin+text, text | Somali: bin+text, text | Southern Azerbaijani: bin+text, text |
| Spanish: bin+text, text | Sranan: bin+text, text | Sundanese: bin+text, text |
| Swahili: bin+text, text | Swati: bin+text, text | Swedish: bin+text, text |
| Tagalog: bin+text, text | Tahitian: bin+text, text | Tajik: bin+text, text |
| Tamil: bin+text, text | Tarantino: bin+text, text | Tatar: bin+text, text |
| Telugu: bin+text, text | Tetum: bin+text, text | Thai: bin+text, text |
| Tibetan: bin+text, text | Tigrinya: bin+text, text | Tok Pisin: bin+text, text |
| Tongan: bin+text, text | Tsonga: bin+text, text | Tswana: bin+text, text |
| Tulu: bin+text, text | Tumbuka: bin+text, text | Turkish: bin+text, text |
| Turkmen: bin+text, text | Tuvan: bin+text, text | Twi: bin+text, text |
| Udmurt: bin+text, text | Ukrainian: bin+text, text | Upper Sorbian: bin+text, text |
| Urdu: bin+text, text | Uyghur: bin+text, text | Uzbek: bin+text, text |
| Venda: bin+text, text | Venetian: bin+text, text | Vepsian: bin+text, text |
| Vietnamese: bin+text, text | Volapük: bin+text, text | Võro: bin+text, text |
| Walloon: bin+text, text | Waray: bin+text, text | Welsh: bin+text, text |
| West Flemish: bin+text, text | West Frisian: bin+text, text | Western Punjabi: bin+text, text |
| Wolof: bin+text, text | Wu: bin+text, text | Xhosa: bin+text, text |
| Yiddish: bin+text, text | Yoruba: bin+text, text | Zazaki: bin+text, text |
| Zeelandic: bin+text, text | Zhuang: bin+text, text | Zulu: bin+text, text |
Format
The word vectors come in both the binary and text default formats of fastText. In the text format, each line contains a word followed by its vector. Each value is space separated. Words are ordered by their frequency in a descending order.
License
The word vectors are distributed under the Creative Commons Attribution-Share-Alike License 3.0.
References
If you use these word vectors, please cite the following paper:
P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information
@article{bojanowski2017enriching,
title={Enriching Word Vectors with Subword Information},
author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
journal={Transactions of the Association for Computational Linguistics},
volume={5},
year={2017},
issn={2307-387X},
pages={135--146}
}