Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameter from_topn in evaluate_word_analogies #3400

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

divyanx
Copy link

@divyanx divyanx commented Nov 4, 2022

from_topn will mark correct if the expected vector is not necessarily the most similar but among to from_topn most similar.

Useful for the evaluation of vectors like confusion vectors, in which any of the top two results match then it is marked correct.

from_topn will mark correct if the expect vector not necessarily the most similar but among to from_topn most similar.

Useful for evaluation of vectors like confusion vectors
@piskvorky
Copy link
Owner

from_topn will mark correct if the expected vector is not necessarily the most similar but among to from_topn most similar.

Useful for the evaluation of vectors like confusion vectors, in which any of the top two results match then it is marked correct.

I have no idea what you're saying, what you're trying to do. Can you rephrase?

@divyanx
Copy link
Author

divyanx commented Nov 13, 2022

So, in the evaluation of word2vec, we apply analogy questions and the resultant vector should be closest to the actual answer then only it will be marked correct.
For example, V = King - Man + Woman, if the answer will be closest to Queen then only we will mark it correct. But suppose i want that if we find few closest vectors to V, and suppose we get {Princess, Queen, Prince...}, then my criteria of marking correct will be if they come in topn most similar, then also we will say it as correct.

I guess you get my idea @piskvorky , it's not very useful for classical word2vec, but for vectors like confusion2vec and other such vectors in which there are more dimensions (like other than semantic and syntactic meanings like acoustic confusability in word2vec), then this criterion of testing can be very useful.

You can suggest implementations and variable names and i can change them accordingly.

@gojomo
Copy link
Collaborator

gojomo commented Dec 28, 2022

I can understand why some projects might want a looser "@n" test of whether a model does well at analogies. And, this patch seems to achieve that.

But, the logic in the surrounding method for applying all the various caveats – case-insensitivity, limited range-of-vocab, ignoring words in analogy – already strikes me as somewhat convoluted. Adding this new looser-match as a loop-counter, with two more break-branches, makes it worse. If any new capabilities are layered-in here, I'd prefer to see them with some refactoring that makes the code, & many contrasting evaluation-possibilities here, more clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants