Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why language_model.py has different vectors #91

Open
zysNLP opened this issue Jun 27, 2021 · 1 comment
Open

why language_model.py has different vectors #91

zysNLP opened this issue Jun 27, 2021 · 1 comment

Comments

@zysNLP
Copy link

zysNLP commented Jun 27, 2021

In language_model.py codes, class NextSentencePrediction and class MaskedLanguageModel has different input x in their forward function, In class NextSentencePrediction the input use "x[: 0] in "return self.softmax(self.linear(x[:, 0]))" but in class MaskedLanguageModel the input use x in "return self.softmax(self.linear(x))", I think if there is something wrong?

As I debug to this, both x in their has shape of (batch_size, seq_len, embedding_dim) like (64, 50, 256), we known this mains I have 64 sentences and each sentences has 50 words and each word is a 256 dim of vector. But the x[:, 0] means I take the first word in every 64 sentences so lead x[:, 0] to has shape of (64, 256). I don't understand why the task NextSentencePrediction should use this kind of input, can someone help me to explain this?

@boykis82
Copy link

x[:, 0] includes all semantics of x[:,0:50] because of self attention.
You can use x[:,0] or x[:,1] or sum(x, axis=1) or mean(x, axis=1) ... whatever you wants.
But in my experience, there are no performance difference.
It's enough to use only x[:,0] when you train classification task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants