Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLMPruner:大语言模型裁剪工具 #157

Open
ziwang-com opened this issue Jun 25, 2023 · 0 comments
Open

LLMPruner:大语言模型裁剪工具 #157

ziwang-com opened this issue Jun 25, 2023 · 0 comments

Comments

@ziwang-com
Copy link
Owner

https://github.com/yangjianxin1/LLMPruner
LLMPruner:大语言模型裁剪工具
项目简介
微信公众号【YeungNLP】文章:LLMPruner:大语言模型裁剪工具

LLMPruner是一个大语言模型裁剪工具,通过对大语言模型的冗余词表进行裁剪,减少模型参数量,降低显存占用,提升训练速度,并且能够保留预训练中学习到的知识。

大语言模型(LLM, Large Language Model)犹如雨后春笋般,其虽然效果惊艳,但参数量巨大,让普通玩家望而却步。如今的大语言模型大多为多语种大预言模型(Multilingual Large Language Model),如LLaMA、mT5、Bloom等,其词表规模巨大,占据非常大部分的模型参数,如Bloom具有25万词表。在训练模型时,词表权重将会消耗非常大的显存,降低训练速度,产生OOM的现象。

然而在许多下游任务中,我们往往只需要使用到一两种语言,例如在中文场景中,一般只会用到中英文。我们可以对大语言模型的词表进行裁剪,只留下所需的部分,这样不仅能够充分保留模型的预训练知识,并且能够使用更少的显卡进行下游任务的finetune,提升训练效率。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant