New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

question about attention mask #15

Open

wli33 opened this issue Dec 15, 2023 · 1 comment

wli33 commented Dec 15, 2023

关于第九章这一部分：
为了简化数据处理，这里我们并没有将 [CLS]、[SEP]、[PAD] 等特殊 token 对应的标签设为 -100，而是维持原始的 0 值，然后在计算损失时借助 Attention Mask 来排除填充位置。

attention mask对于cls的位置是1。“active_loss = attention_mask.view(-1) == 1”会包括cls。是否需要mask掉？

Owner

jsksxs360 commented Dec 16, 2023

你好，active_loss = attention_mask.view(-1) == 1 实际上不止包含了 [CLS]，还包含了 [SEP]，因此这两个 token 的预测值也会参与计算。

考虑到在训练集中所有的 [CLS] 和 [SEP] 对应的标签都为 “O”（非实体），因此模型很容易就会捕获到这种关联，即使它们参与计算也不会有什么影响。

当然，如果你能在计算损失时将他们排除会更好。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment