Skip to content
This repository has been archived by the owner on Jun 10, 2021. It is now read-only.

Generating pre-trained word embeddings #549

Open
carolscarton opened this issue Jun 7, 2018 · 3 comments
Open

Generating pre-trained word embeddings #549

carolscarton opened this issue Jun 7, 2018 · 3 comments

Comments

@carolscarton
Copy link

Hi, I am trying to use the embeddings.lua script to generate pre-trained word embeddings. However, after running something like:

th tools/embeddings.lua -lang en -dict_file data/demo.src.dict -save_data data/demo-src-emb

I get the following error:
.../torch/install/bin/luajit: tools/embeddings.lua:98: embedding file for language code 'en' was not found
stack traceback:
[C]: in function 'error'
tools/embeddings.lua:98: in function 'loadAuto'
tools/embeddings.lua:417: in function 'main'
tools/embeddings.lua:433: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50

Any ideas why this is happening? Thank you in advance for your help!

@guillaumekln
Copy link
Collaborator

Looks like the service we used to download pretrained word embeddings is down, or no more available. I suggest training your own word embeddings using word2vec or fastText.

@i55code
Copy link

i55code commented Jun 21, 2018

May you turn on the service? Thanks!

@guillaumekln
Copy link
Collaborator

The pretrained embeddings can be found here:

https://sites.google.com/site/rmyeid/projects/polyglot#TOC-Download-the-Embeddings

Once the .pkl file is downloaded, it's quite easy to extract the embeddings to feed to the OpenNMT script:

http://nbviewer.jupyter.org/gist/aboSamoor/6046170

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

3 participants