Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] 关于数据处理的疑问 #124

Open
5 tasks done
mynewstart opened this issue Aug 22, 2023 · 0 comments
Open
5 tasks done

[Question] 关于数据处理的疑问 #124

mynewstart opened this issue Aug 22, 2023 · 0 comments
Labels
question Further information is requested

Comments

@mynewstart
Copy link

Required prerequisites

Questions

HI,
现在代码对于数据处理的方式是直接拼接text到max_length,中间用eos。这样操作的话在计算attention的时候,text2其实可以看到text1的内容,如果两个text之间没有啥联系的话会有影响吗?你们在实践中是会mask掉text1的token还是说每个text的文本尽可能的长呢,一个样本只有一个text?

Checklist

  • I have provided all relevant and necessary information above.
  • I have chosen a suitable title for this issue.
@mynewstart mynewstart added the question Further information is requested label Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant