Add parameter from_topn in evaluate_word_analogies #3400

divyanx · 2022-11-04T21:37:09Z

from_topn will mark correct if the expected vector is not necessarily the most similar but among to from_topn most similar.

Useful for the evaluation of vectors like confusion vectors, in which any of the top two results match then it is marked correct.

from_topn will mark correct if the expect vector not necessarily the most similar but among to from_topn most similar. Useful for evaluation of vectors like confusion vectors

piskvorky · 2022-11-05T08:10:43Z

from_topn will mark correct if the expected vector is not necessarily the most similar but among to from_topn most similar.

Useful for the evaluation of vectors like confusion vectors, in which any of the top two results match then it is marked correct.

I have no idea what you're saying, what you're trying to do. Can you rephrase?

divyanx · 2022-11-13T07:53:39Z

So, in the evaluation of word2vec, we apply analogy questions and the resultant vector should be closest to the actual answer then only it will be marked correct.
For example, V = King - Man + Woman, if the answer will be closest to Queen then only we will mark it correct. But suppose i want that if we find few closest vectors to V, and suppose we get {Princess, Queen, Prince...}, then my criteria of marking correct will be if they come in topn most similar, then also we will say it as correct.

I guess you get my idea @piskvorky , it's not very useful for classical word2vec, but for vectors like confusion2vec and other such vectors in which there are more dimensions (like other than semantic and syntactic meanings like acoustic confusability in word2vec), then this criterion of testing can be very useful.

You can suggest implementations and variable names and i can change them accordingly.

gojomo · 2022-12-28T20:06:26Z

I can understand why some projects might want a looser "@n" test of whether a model does well at analogies. And, this patch seems to achieve that.

But, the logic in the surrounding method for applying all the various caveats – case-insensitivity, limited range-of-vocab, ignoring words in analogy – already strikes me as somewhat convoluted. Adding this new looser-match as a loop-counter, with two more break-branches, makes it worse. If any new capabilities are layered-in here, I'd prefer to see them with some refactoring that makes the code, & many contrasting evaluation-possibilities here, more clear.

Add parameter from_topn in evaluate_word_analogies

60a2396

from_topn will mark correct if the expect vector not necessarily the most similar but among to from_topn most similar. Useful for evaluation of vectors like confusion vectors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parameter from_topn in evaluate_word_analogies #3400

Add parameter from_topn in evaluate_word_analogies #3400

divyanx commented Nov 4, 2022

piskvorky commented Nov 5, 2022

divyanx commented Nov 13, 2022 •

edited

gojomo commented Dec 28, 2022

Add parameter from_topn in evaluate_word_analogies #3400

Are you sure you want to change the base?

Add parameter from_topn in evaluate_word_analogies #3400

Conversation

divyanx commented Nov 4, 2022

piskvorky commented Nov 5, 2022

divyanx commented Nov 13, 2022 • edited

gojomo commented Dec 28, 2022

divyanx commented Nov 13, 2022 •

edited