Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using word2vec format models? #23

Open
demongolem opened this issue Jan 24, 2019 · 0 comments
Open

Using word2vec format models? #23

demongolem opened this issue Jan 24, 2019 · 0 comments

Comments

@demongolem
Copy link

In relation to what I saw in #20, I notice that if I get my own large text file and go through the entire training process creating a bin model, the distance.Vocab array appears to correctly contain the vocab (appearing 5 times because that is what I set)

However, if I try to directly use a word2vec model by doing `var distance = new Distance(modelName);' the data structure distance.Vocab gets the first word correct but then is followed by a bunch of floating point numbers and I get words like 0.14687.

So can this code handle word2vec format out of the box like that avoiding training? Is there a problem in that what training creates is binary format whereas the word2vec appears to be a text readable format?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant