GitHub - pedrada88/crossembeddings-twitter

Cross-lingual word embeddings from Twitter

The following repository includes the pre-trained monolingual and cross-lingual word embeddings from the paper Learning Cross-lingual Embeddings from Twitter via Distant Supervision.

Twitter pre-trained word embeddings

We release the 100-dimension monolingual and cross-lingual word embeddings trained on Twitter used in our experiments (English, Spanish, Italian, German and Farsi):

Monolingual FastText embeddings: Available here
Cross-lingual embeddings post-processed with plain averaging: Available here
Cross-lingual embeddings post-processed with weighted averaging: Available here

Update: Embeddings for Finnish and Japanese now also available!

Note 1: All words are lowercased.

Note 2: All emoji have been unified into a single neutral encoding across languages (no skin tone modifiers). All Twitter users have been anonymized with @user.

Reference paper

If you use any of these resources, please cite the following paper:

@inproceedings{xlingtwitter2020icwsm,
  author = 	"Camacho-Collados, Jose and Doval, Yerai and Mart\'{i}nez-C\'{a}mara, Eugenio and Espinosa-Anke, Luis and Barbieri, Francesco and Schockaert, Steven",
  title = 	"Learning Cross-lingual Embeddings from Twitter via Distant Supervision",
  booktitle = 	"Proceedings of ICWSM",
  location = 	"Atlanta, United States",
  year = 	"2020"
}

If you use Fasttext or VecMap, please also cite their corresponding papers.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

Cross-lingual word embeddings from Twitter

Twitter pre-trained word embeddings

Reference paper

About

Releases

Packages

pedrada88/crossembeddings-twitter

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

Cross-lingual word embeddings from Twitter

Twitter pre-trained word embeddings

Reference paper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages