Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training in 8-bit/4-bit causes error #4

Open
Muhammad4hmed opened this issue Nov 28, 2023 · 1 comment
Open

Training in 8-bit/4-bit causes error #4

Muhammad4hmed opened this issue Nov 28, 2023 · 1 comment

Comments

@Muhammad4hmed
Copy link

Hi,

First of all, great work! I've tested your model on some videos and they are doing perfect!
I was trying to pretrain and finetune the model on my custom dataset, the data is well prepared as per the mentioned instructions but since I have less ram (16GB) in my RTX 3080 Ti, I was trying with quantization (8-Bit) for training.

The command I used:

deepspeed \
ChatUniVi/train/train_mem.py \
--model_name_or_path Chat-UniVi/Chat-UniVi \
--version v1 \
--model_use PRETUNE \
--dataset_use "New" \
--vision_tower openai/clip-vit-large-patch14 \
--tune_mm_mlp_adapter True \
--mm_vision_select_layer -2 \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--bf16 True \
--output_dir "EXP" \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 24000 \
--save_total_limit 1 \
--learning_rate 2e-3 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--bits 8 \
--tf32 True \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--model_max_length 2048 \
--gradient_checkpointing True \
--dataloader_num_workers 0 \
--lazy_preprocess True 

The reason I removed --deepspeed scripts/zero2.json is because it wasn't reducing memory usage at all.
After removing it, the model memory consumption is reduced significantly.

The problem is, I'm getting two errors, first one at:

        model.config.tune_mm_mlp_adapter = training_args.tune_mm_mlp_adapter = model_args.tune_mm_mlp_adapter
        if model_args.tune_mm_mlp_adapter:
            model.requires_grad_(False)
            for p in model.get_model().mm_projector.parameters():
           ---->p.requires_grad = True

Error: RuntimeError: only Tensors of floating point and complex dtype can require gradients

So I commented it out and enabled LoRa optimzation lora_enable: bool = True it adds adapter weights with require_gradient = True

But I faced the following error at training time:

RuntimeError: "addmm_cuda" not implemented for 'Char'

The complete log of last error:

/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.int8 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Traceback (most recent call last):
  File "/home/ahmed/Desktop/WORK/Pioneer/VLM/01_12_23/UniVi/Chat-UniVi/ChatUniVi/train/train_mem.py", line 13, in <module>
    train()
  File "/home/ahmed/Desktop/WORK/Pioneer/VLM/24_11_23/UniVi/Chat-UniVi/ChatUniVi/train/train.py", line 1089, in train
    trainer.train()
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
    return inner_training_loop(
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/transformers/trainer.py", line 2654, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/transformers/trainer.py", line 2679, in compute_loss
    outputs = model(**inputs)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward
    else self._run_ddp_forward(*inputs, **kwargs)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
    return self.module(*inputs, **kwargs)  # type: ignore[index]
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/accelerate/utils/operations.py", line 581, in forward
    return model_forward(*args, **kwargs)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/accelerate/utils/operations.py", line 569, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/peft/peft_model.py", line 922, in forward
    return self.base_model(
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/ahmed/Desktop/WORK/Pioneer/VLM/24_11_23/UniVi/Chat-UniVi/ChatUniVi/model/language_model/llama.py", line 54, in forward
    input_ids, attention_mask, past_key_values, inputs_embeds, labels = self.prepare_inputs_labels_for_multimodal(input_ids, attention_mask, past_key_values, labels, images)
  File "/home/ahmed/Desktop/WORK/Pioneer/VLM/24_11_23/UniVi/Chat-UniVi/ChatUniVi/model/arch.py", line 283, in prepare_inputs_labels_for_multimodal
    cur_image_features = self.project(cur_image_features, input_type="video")
  File "/home/ahmed/Desktop/WORK/Pioneer/VLM/24_11_23/UniVi/Chat-UniVi/ChatUniVi/model/arch.py", line 215, in project
    image_features = self.get_model().mm_projector(image_features)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/peft/tuners/lora.py", line 1064, in forward
    result = super().forward(x)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 441, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 563, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/ahmed/miniconda3/envs/chatunivi/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 421, in forward
    output += torch.matmul(subA, state.subB)
RuntimeError: "addmm_cuda" not implemented for 'Char'

I'm a little confused here, if I'm doing something wrong or I'm missing anything?
Can you please help here?
Thanks

@jpthu17
Copy link
Member

jpthu17 commented Nov 29, 2023

Thank you for your interest in our work. I will solve this error as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants