fastText
  • Docs
  • Resources
  • Blog
  • GitHub

›Resources

Resources

  • English word vectors
  • Word vectors for 157 languages
  • Wiki word vectors
  • Aligned word vectors
  • Supervised models
  • Language identification
  • Datasets

Wiki word vectors

We are publishing pre-trained word vectors for 294 languages, trained on Wikipedia using fastText. These vectors in dimension 300 were obtained using the skip-gram model described in Bojanowski et al. (2016) with default parameters.

Please note that a newer version of multi-lingual word vectors are available at: Word vectors for 157 languages.

Models

The models can be downloaded from:

Abkhazian: bin+text, textAcehnese: bin+text, textAdyghe: bin+text, text
Afar: bin+text, textAfrikaans: bin+text, textAkan: bin+text, text
Albanian: bin+text, textAlemannic: bin+text, textAmharic: bin+text, text
Anglo_Saxon: bin+text, textArabic: bin+text, textAragonese: bin+text, text
Aramaic: bin+text, textArmenian: bin+text, textAromanian: bin+text, text
Assamese: bin+text, textAsturian: bin+text, textAvar: bin+text, text
Aymara: bin+text, textAzerbaijani: bin+text, textBambara: bin+text, text
Banjar: bin+text, textBanyumasan: bin+text, textBashkir: bin+text, text
Basque: bin+text, textBavarian: bin+text, textBelarusian: bin+text, text
Bengali: bin+text, textBihari: bin+text, textBishnupriya Manipuri: bin+text, text
Bislama: bin+text, textBosnian: bin+text, textBreton: bin+text, text
Buginese: bin+text, textBulgarian: bin+text, textBurmese: bin+text, text
Buryat: bin+text, textCantonese: bin+text, textCatalan: bin+text, text
Cebuano: bin+text, textCentral Bicolano: bin+text, textChamorro: bin+text, text
Chavacano: bin+text, textChechen: bin+text, textCherokee: bin+text, text
Cheyenne: bin+text, textChichewa: bin+text, textChinese: bin+text, text
Choctaw: bin+text, textChuvash: bin+text, textClassical Chinese: bin+text, text
Cornish: bin+text, textCorsican: bin+text, textCree: bin+text, text
Crimean Tatar: bin+text, textCroatian: bin+text, textCzech: bin+text, text
Danish: bin+text, textDivehi: bin+text, textDutch: bin+text, text
Dutch Low Saxon: bin+text, textDzongkha: bin+text, textEastern Punjabi: bin+text, text
Egyptian Arabic: bin+text, textEmilian_Romagnol: bin+text, textEnglish: bin+text, text
Erzya: bin+text, textEsperanto: bin+text, textEstonian: bin+text, text
Ewe: bin+text, textExtremaduran: bin+text, textFaroese: bin+text, text
Fiji Hindi: bin+text, textFijian: bin+text, textFinnish: bin+text, text
Franco_Provençal: bin+text, textFrench: bin+text, textFriulian: bin+text, text
Fula: bin+text, textGagauz: bin+text, textGalician: bin+text, text
Gan: bin+text, textGeorgian: bin+text, textGerman: bin+text, text
Gilaki: bin+text, textGoan Konkani: bin+text, textGothic: bin+text, text
Greek: bin+text, textGreenlandic: bin+text, textGuarani: bin+text, text
Gujarati: bin+text, textHaitian: bin+text, textHakka: bin+text, text
Hausa: bin+text, textHawaiian: bin+text, textHebrew: bin+text, text
Herero: bin+text, textHill Mari: bin+text, textHindi: bin+text, text
Hiri Motu: bin+text, textHungarian: bin+text, textIcelandic: bin+text, text
Ido: bin+text, textIgbo: bin+text, textIlokano: bin+text, text
Indonesian: bin+text, textInterlingua: bin+text, textInterlingue: bin+text, text
Inuktitut: bin+text, textInupiak: bin+text, textIrish: bin+text, text
Italian: bin+text, textJamaican Patois: bin+text, textJapanese: bin+text, text
Javanese: bin+text, textKabardian: bin+text, textKabyle: bin+text, text
Kalmyk: bin+text, textKannada: bin+text, textKanuri: bin+text, text
Kapampangan: bin+text, textKarachay_Balkar: bin+text, textKarakalpak: bin+text, text
Kashmiri: bin+text, textKashubian: bin+text, textKazakh: bin+text, text
Khmer: bin+text, textKikuyu: bin+text, textKinyarwanda: bin+text, text
Kirghiz: bin+text, textKirundi: bin+text, textKomi: bin+text, text
Komi_Permyak: bin+text, textKongo: bin+text, textKorean: bin+text, text
Kuanyama: bin+text, textKurdish (Kurmanji): bin+text, textKurdish (Sorani): bin+text, text
Ladino: bin+text, textLak: bin+text, textLao: bin+text, text
Latgalian: bin+text, textLatin: bin+text, textLatvian: bin+text, text
Lezgian: bin+text, textLigurian: bin+text, textLimburgish: bin+text, text
Lingala: bin+text, textLithuanian: bin+text, textLivvi_Karelian: bin+text, text
Lojban: bin+text, textLombard: bin+text, textLow Saxon: bin+text, text
Lower Sorbian: bin+text, textLuganda: bin+text, textLuxembourgish: bin+text, text
Macedonian: bin+text, textMaithili: bin+text, textMalagasy: bin+text, text
Malay: bin+text, textMalayalam: bin+text, textMaltese: bin+text, text
Manx: bin+text, textMaori: bin+text, textMarathi: bin+text, text
Marshallese: bin+text, textMazandarani: bin+text, textMeadow Mari: bin+text, text
Min Dong: bin+text, textMin Nan: bin+text, textMinangkabau: bin+text, text
Mingrelian: bin+text, textMirandese: bin+text, textMoksha: bin+text, text
Moldovan: bin+text, textMongolian: bin+text, textMuscogee: bin+text, text
Nahuatl: bin+text, textNauruan: bin+text, textNavajo: bin+text, text
Ndonga: bin+text, textNeapolitan: bin+text, textNepali: bin+text, text
Newar: bin+text, textNorfolk: bin+text, textNorman: bin+text, text
North Frisian: bin+text, textNorthern Luri: bin+text, textNorthern Sami: bin+text, text
Northern Sotho: bin+text, textNorwegian (Bokmål): bin+text, textNorwegian (Nynorsk): bin+text, text
Novial: bin+text, textNuosu: bin+text, textOccitan: bin+text, text
Old Church Slavonic: bin+text, textOriya: bin+text, textOromo: bin+text, text
Ossetian: bin+text, textPalatinate German: bin+text, textPali: bin+text, text
Pangasinan: bin+text, textPapiamentu: bin+text, textPashto: bin+text, text
Pennsylvania German: bin+text, textPersian: bin+text, textPicard: bin+text, text
Piedmontese: bin+text, textPolish: bin+text, textPontic: bin+text, text
Portuguese: bin+text, textQuechua: bin+text, textRipuarian: bin+text, text
Romani: bin+text, textRomanian: bin+text, textRomansh: bin+text, text
Russian: bin+text, textRusyn: bin+text, textSakha: bin+text, text
Samoan: bin+text, textSamogitian: bin+text, textSango: bin+text, text
Sanskrit: bin+text, textSardinian: bin+text, textSaterland Frisian: bin+text, text
Scots: bin+text, textScottish Gaelic: bin+text, textSerbian: bin+text, text
Serbo_Croatian: bin+text, textSesotho: bin+text, textShona: bin+text, text
Sicilian: bin+text, textSilesian: bin+text, textSimple English: bin+text, text
Sindhi: bin+text, textSinhalese: bin+text, textSlovak: bin+text, text
Slovenian: bin+text, textSomali: bin+text, textSouthern Azerbaijani: bin+text, text
Spanish: bin+text, textSranan: bin+text, textSundanese: bin+text, text
Swahili: bin+text, textSwati: bin+text, textSwedish: bin+text, text
Tagalog: bin+text, textTahitian: bin+text, textTajik: bin+text, text
Tamil: bin+text, textTarantino: bin+text, textTatar: bin+text, text
Telugu: bin+text, textTetum: bin+text, textThai: bin+text, text
Tibetan: bin+text, textTigrinya: bin+text, textTok Pisin: bin+text, text
Tongan: bin+text, textTsonga: bin+text, textTswana: bin+text, text
Tulu: bin+text, textTumbuka: bin+text, textTurkish: bin+text, text
Turkmen: bin+text, textTuvan: bin+text, textTwi: bin+text, text
Udmurt: bin+text, textUkrainian: bin+text, textUpper Sorbian: bin+text, text
Urdu: bin+text, textUyghur: bin+text, textUzbek: bin+text, text
Venda: bin+text, textVenetian: bin+text, textVepsian: bin+text, text
Vietnamese: bin+text, textVolapük: bin+text, textVõro: bin+text, text
Walloon: bin+text, textWaray: bin+text, textWelsh: bin+text, text
West Flemish: bin+text, textWest Frisian: bin+text, textWestern Punjabi: bin+text, text
Wolof: bin+text, textWu: bin+text, textXhosa: bin+text, text
Yiddish: bin+text, textYoruba: bin+text, textZazaki: bin+text, text
Zeelandic: bin+text, textZhuang: bin+text, textZulu: bin+text, text

Format

The word vectors come in both the binary and text default formats of fastText. In the text format, each line contains a word followed by its vector. Each value is space separated. Words are ordered by their frequency in a descending order.

License

The word vectors are distributed under the Creative Commons Attribution-Share-Alike License 3.0.

References

If you use these word vectors, please cite the following paper:

P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information

@article{bojanowski2017enriching,
  title={Enriching Word Vectors with Subword Information},
  author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
  journal={Transactions of the Association for Computational Linguistics},
  volume={5},
  year={2017},
  issn={2307-387X},
  pages={135--146}
}
← Word vectors for 157 languagesAligned word vectors →
fastText
Support
Getting StartedTutorialsFAQsAPI
Community
Facebook GroupStack OverflowGoogle Group
More
BlogGitHubStar
Facebook Open Source
Copyright © 2022 Facebook Inc.