Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

token与label 没有做到对应关系 #5

Open
bestpredicts opened this issue Aug 9, 2020 · 2 comments
Open

token与label 没有做到对应关系 #5

bestpredicts opened this issue Aug 9, 2020 · 2 comments

Comments

@bestpredicts
Copy link

在 convert_examples_to_features 函数中,代码对每个词进行token分词处理对时候,并没有让label更新 使得label来对应每个分词过后对token 只是在最后通过while循环来对齐 但是实际上token和label 已经不是一一对应的关系了 感觉这里应该是有问题的

@liuyukid
Copy link
Owner

模型并不计算sub_token(带##前缀的token)的损失,利用valid_sequence_output函数将sub_token在BERT中的输出给mask掉了,所以最后在计算损失的时候是可以将token和label对应起来的

@HuiBinR
Copy link

HuiBinR commented Jan 12, 2021

我看到这个问题和作者的回答的时候,还是挺迷惑的,觉得确实不对应(但是其实是对应的)。自己写个小例子就能看出来了。
原文: I like perfect things
label:O O B I
分词:I li ##ke per ##fect thing ##s
label: O O B I -100 -100 -100
输出:o1 o2 o3 o4 o5 o6 o7
v-mask: 1 1 0 1 0 1 0
结果:o1 o2 o4 o6 (label完全和分词首部分对上)

输出:模型输出
v-mask:valid mask
结果:valid_sequence_output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants