Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLaMAModel._causal_lm_process中的labels和logits对齐方法疑问 #53

Open
chivychao opened this issue Dec 8, 2023 · 3 comments
Open

Comments

@chivychao
Copy link

chivychao commented Dec 8, 2023

# [invalid] Shift so that tokens < n predict n
# Do not need to shift here
shift_logits = logits[..., :-1, :].contiguous()
shift_labels = labels[..., :-1].contiguous()

其中还有必要把最后一位给抹掉吗?是不是直接可以取整个序列呢?

@li-yi-dong
Copy link
Collaborator

如果用的是Megatron-LM 原生的dataset 可以不用移位。具体要看label 和样本是怎么对应的。

@chivychao
Copy link
Author

那是否可以不使用self._causal_lm_process()来计算loss,仍然使用post_language_model_processing()来计算loss呢?

@li-yi-dong
Copy link
Collaborator

那是否可以不使用self._causal_lm_process()来计算loss,仍然使用post_language_model_processing()来计算loss呢?

可以,但我们发现这个实现和HuggingFace 上的实现计算结果有些差异,需要你自己评估。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants