RuntimeError: CUDA out of memory. Tried to allocate 42.00 MiB (GPU 0; 22.02 GiB total capacity; 20. #60

chenchaoac · 2023-05-05T08:34:37Z

is:issue is:open 请问LLM LoRA Finetune单卡需要多大的显存呀？23G的A100 batch_size=1 max_source_seq_len =4 max_target_seq_len=2还是报内存不足错误呀

wuguangshuo · 2023-05-08T09:47:36Z

大佬解决了吗

kbwzy · 2023-05-15T09:31:23Z

同样24G显卡，batch_size=1 max_source_seq_len =50 失败，解决了吗？

a6225301 · 2023-06-03T05:39:36Z

@kbwzy @chenchaoac @wuguangshuo 各位解决了吗？

hsauod · 2023-06-08T08:37:10Z

24G显存，同样问题，请问解决了吗

rainkin1993 · 2023-06-10T11:26:35Z

我的显卡16GB，训练的时候也报错了，OOM。

理论上16G足够微调lora了，看了下代码，发现是因为微调训练结束后，在保存模型的时候，原始代码里面会将原始模型和微调后心脏的模型参数 merge到一个模型，输出为一个模型文件（这个merge代码中，对训练后的模型deep_copy了一份，相当于需要的内存 * 2）。

解决方案就是：修改下train.py文件中的save_model函数，不merge参数，只将微调后的模型参数单独保存。当然，由于没有merge到一个模型，在推理的时候也需要相应修改下代码，使得代码能够加载原始模型+lora模型参数。

lora模型参数单独保留：

diff --git a/LLM/finetune/train.py b/LLM/finetune/train.py
index 4483fc0..53dc4e9 100644
--- a/LLM/finetune/train.py
+++ b/LLM/finetune/train.py
@@ -155,12 +155,13 @@ def save_model(
     Args:
         cur_save_path (str): 存储路径。
     """
-    if args.use_lora:                       # merge lora params with origin model
-        merged_model = copy.deepcopy(model)
-        merged_model = merged_model.merge_and_unload()
-        merged_model.save_pretrained(cur_save_dir)
-    else:
-        model.save_pretrained(cur_save_dir)
+    # if args.use_lora:                       # merge lora params with origin model
+    #     merged_model = copy.deepcopy(model)
+    #     merged_model = merged_model.merge_and_unload()
+    #     merged_model.save_pretrained(cur_save_dir)
+    # else:
+    #     model.save_pretrained(cur_save_dir)
+    model.save_pretrained(cur_save_dir)

推理的时候，单独加载下lora参数模型：

diff --git a/LLM/finetune/inference.py b/LLM/finetune/inference.py
index f7d1311..183241a 100644
--- a/LLM/finetune/inference.py
+++ b/LLM/finetune/inference.py
@@ -1,3 +1,4 @@
+# coding: utf8
 # !/usr/bin/env python3
 """
 ==== No Bugs in code, just some Random Unexpected FEATURES ====
@@ -23,6 +24,7 @@ Date: 2023/03/17
 import time
 import torch

+from peft import PeftModel
 from transformers import AutoTokenizer, AutoModel
 torch.set_default_tensor_type(torch.cuda.HalfTensor)

@@ -64,18 +66,21 @@ if __name__ == '__main__':

     device = 'cuda:0'
     max_new_tokens = 300
-    model_path = "checkpoints/model_1000"
+    lora_model_path = "checkpoints/finetune/model_1000"

     tokenizer = AutoTokenizer.from_pretrained(
-        model_path,
+        "D:\\software\\chatglm-6b\\chatglm-6b", # 改成chatglm-6b原始模型的地址
         trust_remote_code=True
     )

     model = AutoModel.from_pretrained(
-        model_path,
+        "D:\\software\\chatglm-6b\\chatglm-6b", # # 改成chatglm-6b原始模型的地址
         trust_remote_code=True
     ).half().to(device)

+    model = PeftModel.from_pretrained(model, lora_model_path, adapter_name="lora")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA out of memory. Tried to allocate 42.00 MiB (GPU 0; 22.02 GiB total capacity; 20. #60

RuntimeError: CUDA out of memory. Tried to allocate 42.00 MiB (GPU 0; 22.02 GiB total capacity; 20. #60

chenchaoac commented May 5, 2023

wuguangshuo commented May 8, 2023

kbwzy commented May 15, 2023

a6225301 commented Jun 3, 2023

hsauod commented Jun 8, 2023

rainkin1993 commented Jun 10, 2023

RuntimeError: CUDA out of memory. Tried to allocate 42.00 MiB (GPU 0; 22.02 GiB total capacity; 20. #60

RuntimeError: CUDA out of memory. Tried to allocate 42.00 MiB (GPU 0; 22.02 GiB total capacity; 20. #60

Comments

chenchaoac commented May 5, 2023

wuguangshuo commented May 8, 2023

kbwzy commented May 15, 2023

a6225301 commented Jun 3, 2023

hsauod commented Jun 8, 2023

rainkin1993 commented Jun 10, 2023