You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{ 'adapter': {'type': 'lora'},
'backend': {'type': 'local'},
'base_model': 'mistralai/Mistral-7B-Instruct-v0.2',
'input_features': [{'name': 'prompt', 'type': 'text'}],
'ludwig_version': '0.9.3',
'model_type': 'llm',
'output_features': [{'name': 'output', 'type': 'text'}],
'preprocessing': {'sample_ratio': 0.1},
'prompt': { 'template': '<s>\n'
'[INST]\n'
'Below is an instruction that describes a task, '
'paired with an input that may provide further '
'context.\n'
'Write a response that appropriately completes '
'the request.\n'
'\n'
'### Instruction:\n'
'{instruction}\n'
'[/INST]\n'
'\n'
'### Input:\n'
'{input}\n'
'</s>\n'
'\n'
'### Response:\n'},
'trainer': { 'batch_size': 1,
'epochs': 2,
'gradient_accumulation_steps': 16,
'learning_rate': 0.0002,
'learning_rate_scheduler': { 'decay': 'cosine',
'reduce_on_plateau': 0,
'warmup_fraction': 0.03},
'type': 'finetune'}}
Expected behavior
Training would start.
Screenshots
Starting with step 0, epoch: 0
Training: 0%| | 0/7280 [00:00<?, ?it/s]
/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py:685: UserWarning: Memory efficient kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:415.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py:685: UserWarning: Memory Efficient attention has been runtime disabled. (Triggered internally at ../aten/src/ATen/native/transformers/sdp_utils_cpp.h:456.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py:685: UserWarning: Flash attention kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:417.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py:685: UserWarning: Expected query, key and value to all be of dtype: {Half, BFloat16}. Got Query dtype: float, Key dtype: float, and Value dtype: float instead. (Triggered internally at ../aten/src/ATen/native/tra
nsformers/sdp_utils_cpp.h:101.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
Traceback (most recent call last):
File "/home/azureuser/ludwig/venv/bin/ludwig", line 8, in <module>
sys.exit(main())
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/cli.py", line 197, in main
CLI()
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/cli.py", line 72, in __init__
getattr(self, args.command)()
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/cli.py", line 77, in train
train.cli(sys.argv[2:])
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/train.py", line 395, in cli
train_cli(**vars(args))
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/train.py", line 185, in train_cli
model.train(
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/api.py", line 678, in train
train_stats = trainer.train(
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/trainers/trainer.py", line 1044, in train
should_break, has_nan_or_inf_tensors = self._train_loop(
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/trainers/trainer.py", line 1241, in _train_loop
loss, all_losses, used_tokens = self.train_step(inputs, targets, should_step=should_step, profiler=profiler)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/trainers/trainer.py", line 337, in train_step
model_outputs = self.dist_model((inputs, targets))
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/models/llm.py", line 274, in forward
model_outputs = self.model(input_ids=self.model_inputs, attention_mask=self.attention_masks).get(LOGITS)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/peft/peft_model.py", line 1083, in forward
return self.base_model(
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(*args, **kwargs)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1154, in forward
outputs = self.model(
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1039, in forward
layer_outputs = decoder_layer(
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 754, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 685, in forward
attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: No available kernel. Aborting execution.
Training: 0%|
Environment (please complete the following information):
OS: Ubuntu
Version: 22.04.4 LTS
Python version: 3.10.12
Ludwig version: 0.9.3
Additional context
With 8-bit quantization it works:
Starting with step 0, epoch: 0
Training: 0%| | 0/7280 [00:00<?, ?it/s]/home/azureuser/ludwig/venv/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Training: 6%|██████████████▍ | 416/7280 [04:04<1:03:58, 1.79it/s, loss=0.0647]
The text was updated successfully, but these errors were encountered:
Hey @simaotwx - sorry we missed this. May I ask what torch, huggingface and peft version you were running into this issue with? Are you still seeing it?
Hey @simaotwx - sorry we missed this. May I ask what torch, huggingface and peft version you were running into this issue with? Are you still seeing it?
Describe the bug
When trying to train (fine-tune) a Mistral model, there is an error when not using quantization.
To Reproduce
Steps to reproduce the behavior:
Please provide code, yaml config file and a sample of data in order to entirely reproduce the issue.
Issues that are not reproducible will be ignored.
Config file (model.yaml):
Command:
ludwig train --config model.yaml --dataset "ludwig://alpaca"
Experiment description:
User-specified config (with upgrades):
Expected behavior
Training would start.
Screenshots
Environment (please complete the following information):
Additional context
With 8-bit quantization it works:
The text was updated successfully, but these errors were encountered: