Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about attention mask #15

Open
wli33 opened this issue Dec 15, 2023 · 1 comment
Open

question about attention mask #15

wli33 opened this issue Dec 15, 2023 · 1 comment

Comments

@wli33
Copy link

wli33 commented Dec 15, 2023

关于第九章这一部分:
为了简化数据处理,这里我们并没有将 [CLS]、[SEP]、[PAD] 等特殊 token 对应的标签设为 -100,而是维持原始的 0 值,然后在计算损失时借助 Attention Mask 来排除填充位置。

attention mask对于cls的位置是1。“active_loss = attention_mask.view(-1) == 1”会包括cls。是否需要mask掉?

@jsksxs360
Copy link
Owner

你好,active_loss = attention_mask.view(-1) == 1 实际上不止包含了 [CLS],还包含了 [SEP],因此这两个 token 的预测值也会参与计算。

考虑到在训练集中所有的 [CLS][SEP] 对应的标签都为 “O”(非实体),因此模型很容易就会捕获到这种关联,即使它们参与计算也不会有什么影响。

当然,如果你能在计算损失时将他们排除会更好。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants