Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why doesn't the model input include attention_mask? #58

Open
li3cmz opened this issue Dec 25, 2020 · 2 comments
Open

Why doesn't the model input include attention_mask? #58

li3cmz opened this issue Dec 25, 2020 · 2 comments

Comments

@li3cmz
Copy link

li3cmz commented Dec 25, 2020

loss, ppl = model(input_ids, position_ids, token_ids, label_ids)

Since it is a LMHeadModel, the 1^th-n^th tokens are used to predict the (n+1)^th token during training, so why not introduce attention_mask for masking the (n+2)^th-(n+m)^th tokens. Without attention_mask, there maybe an inconsistency between the training and the testing scene. Is it possible to add attention_mask during training to make the testing better?

@chujiezheng
Copy link

Because GPT is a uni-directional language model. It does not need attention mask.

@lmrojasb
Copy link

DialoGPT/data_loader.py

Why is the response concatenated to the input_ids for both the train and validation datasets? Would not this create over-fitted models? Would it be possible to somehow mask the response ids?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants