Build out-of-vocabulary word fom data.bin #19

binhna · 2018-06-12T08:37:38Z

Because the advantage of subword model is that we can create the new words from pre-trained characters, I wonder how can I create a new word vector from the data.bin file. Does that .bin file contain characters and their vectors?
Thanks.

adodge · 2018-06-24T21:44:33Z

The .bin files are fasttext model files. They're slightly out of date, but if you apply the script from #14 you can use the fasttext program to generate word vectors for new words.

binhna · 2018-06-25T03:10:49Z

Yeah. Thank you, but I seem don't know how to use the script. I have the .bin file and your script and fasttext program, and how exactly I can apply your script to generate new words?

binhna · 2018-06-25T03:20:37Z

Oh I know it now. The first and second argument in your script is the old and new .bin file respectively. After we got the new .bin file, we can use fasttext to generate a new word embedding.
Thanks a lot for your script!

kusumlata123 · 2019-06-07T03:39:33Z

Hi , I am using hindi language word2vec hi.bin so when i am using my corpus to find vector of word then for some number like 3740 ( ३७४० ) it give out of vocabulary. what should i do for this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build out-of-vocabulary word fom data.bin #19

Build out-of-vocabulary word fom data.bin #19

binhna commented Jun 12, 2018

adodge commented Jun 24, 2018

binhna commented Jun 25, 2018

binhna commented Jun 25, 2018

kusumlata123 commented Jun 7, 2019

Build out-of-vocabulary word fom data.bin #19

Build out-of-vocabulary word fom data.bin #19

Comments

binhna commented Jun 12, 2018

adodge commented Jun 24, 2018

binhna commented Jun 25, 2018

binhna commented Jun 25, 2018

kusumlata123 commented Jun 7, 2019