Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

padding, softmax, embeddings #13

Open
ka-bu opened this issue Nov 20, 2018 · 5 comments · Fixed by #16
Open

padding, softmax, embeddings #13

ka-bu opened this issue Nov 20, 2018 · 5 comments · Fixed by #16

Comments

@ka-bu
Copy link

ka-bu commented Nov 20, 2018

Hi,

I have two questions regarding the CAML implementation:

  1. All the texts in a batch are padded, but the input to the softmax function is not masked. Hence, this implementation also assigns positives attentions to padding tokens, right? Do I miss something here?
  2. The embedding vector that belongs to the padding tokens does not seem to be fixed to the zero vector. If not, then where is that constraint implemented? (I guess it wouldn't make a difference if 1. was handled differently, i.e. if the attentions for padding vectors would be fixed to 0).

Many thanks!

@sarahwie
Copy link
Collaborator

Should be fixed from the above PR, although in my experience this doesn't really change the result.

@ka-bu
Copy link
Author

ka-bu commented Dec 10, 2018

No, the PR doesn't fix everything. In my experience, fixing the embedding of the padding tokens does not change much, but masking the softmax input does.

@sarahwie
Copy link
Collaborator

I see what you mean. I'll look into it.

@datduong
Copy link

datduong commented Jun 3, 2019

I have the same question here about taking softmax to compute attention weights. I rewrote my code to explicitly truncate each sample in the batch (quite inefficient). Some preliminary result shows about 3-4% drop for simple case of base CNN with 50 common labels. Would anyone be able to chime in on this issue? Thanks.

@datduong
Copy link

datduong commented Jun 3, 2019

This line here still does not use any masking https://github.com/jamesmullenbach/caml-mimic/blob/master/learn/models.py#L184 to compute weights.

@sarahwie sarahwie reopened this Jan 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants