Generating pre-trained word embeddings #549

carolscarton · 2018-06-07T10:37:50Z

Hi, I am trying to use the embeddings.lua script to generate pre-trained word embeddings. However, after running something like:

th tools/embeddings.lua -lang en -dict_file data/demo.src.dict -save_data data/demo-src-emb

I get the following error:
.../torch/install/bin/luajit: tools/embeddings.lua:98: embedding file for language code 'en' was not found
stack traceback:
[C]: in function 'error'
tools/embeddings.lua:98: in function 'loadAuto'
tools/embeddings.lua:417: in function 'main'
tools/embeddings.lua:433: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50

Any ideas why this is happening? Thank you in advance for your help!

guillaumekln · 2018-06-07T16:06:37Z

Looks like the service we used to download pretrained word embeddings is down, or no more available. I suggest training your own word embeddings using word2vec or fastText.

i55code · 2018-06-21T15:04:10Z

May you turn on the service? Thanks!

guillaumekln · 2018-07-10T13:16:36Z

The pretrained embeddings can be found here:

https://sites.google.com/site/rmyeid/projects/polyglot#TOC-Download-the-Embeddings

Once the .pkl file is downloaded, it's quite easy to extract the embeddings to feed to the OpenNMT script:

http://nbviewer.jupyter.org/gist/aboSamoor/6046170

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating pre-trained word embeddings #549

Generating pre-trained word embeddings #549

carolscarton commented Jun 7, 2018

guillaumekln commented Jun 7, 2018

i55code commented Jun 21, 2018

guillaumekln commented Jul 10, 2018

Generating pre-trained word embeddings #549

Generating pre-trained word embeddings #549

Comments

carolscarton commented Jun 7, 2018

guillaumekln commented Jun 7, 2018

i55code commented Jun 21, 2018

guillaumekln commented Jul 10, 2018