Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to fine-tune when not using quantization #3939

Open
simaotwx opened this issue Feb 22, 2024 · 2 comments
Open

Unable to fine-tune when not using quantization #3939

simaotwx opened this issue Feb 22, 2024 · 2 comments

Comments

@simaotwx
Copy link

simaotwx commented Feb 22, 2024

Describe the bug
When trying to train (fine-tune) a Mistral model, there is an error when not using quantization.

To Reproduce
Steps to reproduce the behavior:

  1. Install ludwig as usual and use the config provided below
  2. Start the training process

Please provide code, yaml config file and a sample of data in order to entirely reproduce the issue.
Issues that are not reproducible will be ignored.

Config file (model.yaml):

model_type: llm  
base_model: mistralai/Mistral-7B-Instruct-v0.2
                                                                               
#quantization: 
#  bits: 8
                                                                               
adapter:        
  type: lora  

prompt: 
  template: |   
    <s>                
    [INST]     
    Below is an instruction that describes a task, paired with an input that may provide further context.
    Write a response that appropriately completes the request.
                                                                               
    ### Instruction:
    {instruction}        
    [/INST]             

    ### Input:
    {input}        
    </s>
                                                                               
    ### Response:
                                                                               
input_features:
  - name: prompt
    type: text

output_features:
  - name: output
    type: text

trainer:
  type: finetune
  learning_rate: 2.0e-4
  batch_size: 1
  gradient_accumulation_steps: 16
  epochs: 2
  learning_rate_scheduler:
    decay: cosine
    warmup_fraction: 0.03
    reduce_on_plateau: 0

preprocessing:
  sample_ratio: 0.1

backend:
  type: local

Command: ludwig train --config model.yaml --dataset "ludwig://alpaca"

Experiment description:

╒══════════════════╤══════════════════════════════════════════════════════════════════════════════════════╕                                                                                                                                                                                                                  
│ Experiment name  │ experiment                                                                           │                                                                                                                                                                                                                  
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤                                                                                                                                                                                                                  
│ Model name       │ run                                                                                  │                                                                                                                                                                                                                  
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤                                                                                                                                                                                                                  
│ Output directory │ /home/azureuser/ludwig/results/experiment_run_14                                     │                                                                                                                                                                                                                  
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤                                                                                                                                                                                                                  
│ ludwig_version   │ '0.9.3'                                                                              │                                                                                                                                                                                                                  
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤                                                                                                                                                                                                                  
│ command          │ ('/home/azureuser/ludwig/venv/bin/ludwig train --config model.yaml --dataset '       │                                                                                                                                                                                                                  
│                  │  'ludwig://alpaca')                                                                  │                                                   
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤                                                                                                                                                                                                                  
│ random_seed      │ 42                                                                                   │                                                   
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤                                                                                                                                                                                                                  
│ dataset          │ 'ludwig://alpaca'                                                                    │                                                   
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤                                                   
│ data_format      │ 'ludwig'                                                                             │                                                   
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤                                                   
│ torch_version    │ '2.2.0+cu121'                                                                        │                                                   
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ 
│ compute          │ {   'arch_list': [   'sm_50',                                                        │                                                                                                                                                                                                                  
│                  │                      'sm_60',                                                        │                                                   
│                  │                      'sm_70',                                                        │                                                                                                                                                                                                                  
│                  │                      'sm_75',                                                        │                                                   
│                  │                      'sm_80',                                                        │                                                   
│                  │                      'sm_86',                                                        │                                                   
│                  │                      'sm_90'],                                                       │                                                   
│                  │     'devices': {   0: {   'device_capability': (8, 0),                               │                                                   
│                  │                           'device_properties': "_CudaDeviceProperties(name='NVIDIA " │                                                   
│                  │                                                "A100 80GB PCIe', major=8, "          │                                                   
│                  │                                                'minor=0, total_memory=81049MB, '     │                                                   
│                  │                                                'multi_processor_count=108)',         │                                                   
│                  │                           'gpu_type': 'NVIDIA A100 80GB PCIe'}},                     │                                                   
│                  │     'gencode_flags': '-gencode compute=compute_50,code=sm_50 -gencode '              │                                                   
│                  │                      'compute=compute_60,code=sm_60 -gencode '                       │
│                  │                      'compute=compute_70,code=sm_70 -gencode '                       │
│                  │                      'compute=compute_75,code=sm_75 -gencode '                       │
│                  │                      'compute=compute_80,code=sm_80 -gencode '                       │
│                  │                      'compute=compute_86,code=sm_86 -gencode '                       │
│                  │                      'compute=compute_90,code=sm_90',                                │
│                  │     'gpus_per_node': 1,                                                              │
│                  │     'num_nodes': 1}                                                                  │
╘══════════════════╧══════════════════════════════════════════════════════════════════════════════════════╛

User-specified config (with upgrades):

{   'adapter': {'type': 'lora'},
    'backend': {'type': 'local'},
    'base_model': 'mistralai/Mistral-7B-Instruct-v0.2',
    'input_features': [{'name': 'prompt', 'type': 'text'}],
    'ludwig_version': '0.9.3',
    'model_type': 'llm',
    'output_features': [{'name': 'output', 'type': 'text'}],
    'preprocessing': {'sample_ratio': 0.1},
    'prompt': {   'template': '<s>\n'
                              '[INST]\n'
                              'Below is an instruction that describes a task, ' 
                              'paired with an input that may provide further '
                              'context.\n'
                              'Write a response that appropriately completes '
                              'the request.\n'
                              '\n'
                              '### Instruction:\n'
                              '{instruction}\n'
                              '[/INST]\n'
                              '\n'
                              '### Input:\n'
                              '{input}\n'
                              '</s>\n'
                              '\n'
                              '### Response:\n'},
    'trainer': {   'batch_size': 1,
                   'epochs': 2,
                   'gradient_accumulation_steps': 16,
                   'learning_rate': 0.0002,
                   'learning_rate_scheduler': {   'decay': 'cosine',
                                                  'reduce_on_plateau': 0,
                                                  'warmup_fraction': 0.03},
                   'type': 'finetune'}}

Expected behavior
Training would start.

Screenshots

Starting with step 0, epoch: 0
Training:   0%|                                                                                                                                                                                                                                                                                     | 0/7280 [00:00<?, ?it/s]
/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py:685: UserWarning: Memory efficient kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:415.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py:685: UserWarning: Memory Efficient attention has been runtime disabled. (Triggered internally at ../aten/src/ATen/native/transformers/sdp_utils_cpp.h:456.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py:685: UserWarning: Flash attention kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:417.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py:685: UserWarning: Expected query, key and value to all be of dtype: {Half, BFloat16}. Got Query dtype: float, Key dtype: float, and Value dtype: float instead. (Triggered internally at ../aten/src/ATen/native/tra
nsformers/sdp_utils_cpp.h:101.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
Traceback (most recent call last):
  File "/home/azureuser/ludwig/venv/bin/ludwig", line 8, in <module>
    sys.exit(main())
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/cli.py", line 197, in main
    CLI()
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/cli.py", line 72, in __init__
    getattr(self, args.command)()
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/cli.py", line 77, in train
    train.cli(sys.argv[2:])
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/train.py", line 395, in cli
    train_cli(**vars(args))
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/train.py", line 185, in train_cli
    model.train(
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/api.py", line 678, in train
    train_stats = trainer.train(
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/trainers/trainer.py", line 1044, in train
    should_break, has_nan_or_inf_tensors = self._train_loop(
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/trainers/trainer.py", line 1241, in _train_loop
    loss, all_losses, used_tokens = self.train_step(inputs, targets, should_step=should_step, profiler=profiler)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/trainers/trainer.py", line 337, in train_step
    model_outputs = self.dist_model((inputs, targets))
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/models/llm.py", line 274, in forward
    model_outputs = self.model(input_ids=self.model_inputs, attention_mask=self.attention_masks).get(LOGITS)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/peft/peft_model.py", line 1083, in forward
    return self.base_model(
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
    return self.model.forward(*args, **kwargs)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1154, in forward
    outputs = self.model(
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1039, in forward
    layer_outputs = decoder_layer(
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 754, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 685, in forward
    attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: No available kernel. Aborting execution.
Training:   0%|                                          

Environment (please complete the following information):

  • OS: Ubuntu
  • Version: 22.04.4 LTS
  • Python version: 3.10.12
  • Ludwig version: 0.9.3

Additional context
With 8-bit quantization it works:

Starting with step 0, epoch: 0
Training:   0%|                                                                                                                                                                                                                                                                                     | 0/7280 [00:00<?, ?it/s]/home/azureuser/ludwig/venv/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Training:   6%|██████████████▍                                                                                                                                                                                                                                             | 416/7280 [04:04<1:03:58,  1.79it/s, loss=0.0647]
@arnavgarg1
Copy link
Contributor

Hey @simaotwx - sorry we missed this. May I ask what torch, huggingface and peft version you were running into this issue with? Are you still seeing it?

@simaotwx
Copy link
Author

Hey @simaotwx - sorry we missed this. May I ask what torch, huggingface and peft version you were running into this issue with? Are you still seeing it?

It's okay, no worries.

Versions:

huggingface-hub          0.20.3
peft                     0.8.2
sentence-transformers    2.3.1
torch                    2.2.0
torchaudio               2.2.0
torchdata                0.7.1
torchinfo                1.8.0
torchmetrics             1.3.1
torchtext                0.17.0
torchvision              0.17.0
transformers             4.37.2

I am still seeing the same issue.
I have since moved on and am not using ludwig anymore so this issue isn't relevant for me now but it still exists.

While looking at this issue I noticed that I copied the wrong config. I updated it to match what I just tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants