Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build out-of-vocabulary word fom data.bin #19

Open
binhna opened this issue Jun 12, 2018 · 4 comments
Open

Build out-of-vocabulary word fom data.bin #19

binhna opened this issue Jun 12, 2018 · 4 comments

Comments

@binhna
Copy link

binhna commented Jun 12, 2018

Because the advantage of subword model is that we can create the new words from pre-trained characters, I wonder how can I create a new word vector from the data.bin file. Does that .bin file contain characters and their vectors?
Thanks.

@adodge
Copy link

adodge commented Jun 24, 2018

The .bin files are fasttext model files. They're slightly out of date, but if you apply the script from #14 you can use the fasttext program to generate word vectors for new words.

@binhna
Copy link
Author

binhna commented Jun 25, 2018

Yeah. Thank you, but I seem don't know how to use the script. I have the .bin file and your script and fasttext program, and how exactly I can apply your script to generate new words?

@binhna
Copy link
Author

binhna commented Jun 25, 2018

Oh I know it now. The first and second argument in your script is the old and new .bin file respectively. After we got the new .bin file, we can use fasttext to generate a new word embedding.
Thanks a lot for your script!

@kusumlata123
Copy link

Hi , I am using hindi language word2vec hi.bin so when i am using my corpus to find vector of word then for some number like 3740 ( ३७४० ) it give out of vocabulary. what should i do for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants