New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New colab : Fine-tune LLMs with Axolotl End-to-end guide to the state-of-the-art tool for fine-tuning #40
Comments
Thanks, I completed it and added it to the LLM course, see https://colab.research.google.com/drive/1Xu0BrCB7IShwSWKVcfAfhehwjDrDMH5m?usp=sharing |
If this line of code: !pip install -qqq -e '.[flash-attn,deepspeed]' --progress-bar off gives you an error, you should downgrade Torch to version 2.1.1: !pip install torch==2.1.1 |
Thanks @kukedlc87 I added it |
@mlabonne the Fine_tune_LLMs_with_Axolotl.ipynb does not work.
Training is failing on Colab T4 with
Also you need to remove mlflow reporting from the config otherwise it will complain as it is not installed. |
Thanks I upgraded PyTorch's version and removed mlflow.
…On Sun, May 5, 2024 at 11:44 AM bachr ***@***.***> wrote:
@mlabonne <https://github.com/mlabonne> the
Fine_tune_LLMs_with_Axolotl.ipynb
<https://colab.research.google.com/drive/1Xu0BrCB7IShwSWKVcfAfhehwjDrDMH5m?usp=sharing>
does not work.
Those are the dependencies
****************************************
**** Axolotl Dependency Versions *****
accelerate: 0.28.0
peft: 0.10.0
transformers: 4.40.0.dev0
trl: 0.8.5
torch: 2.2.1+cu121
bitsandbytes: 0.43.0
****************************************
Training is failing on Colab T4 with RuntimeError:
"_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for
'BFloat16'. This is the full stacktrace
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/content/axolotl/src/axolotl/cli/train.py", line 59, in <module>
fire.Fire(do_cli)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/content/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
return do_train(parsed_cfg, parsed_cli_args)
File "/content/axolotl/src/axolotl/cli/train.py", line 55, in do_train
return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
File "/content/axolotl/src/axolotl/train.py", line 170, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1837, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2227, in _inner_training_loop
_grad_norm = self.accelerator.clip_grad_norm_(
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2145, in clip_grad_norm_
self.unscale_gradients()
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2095, in unscale_gradients
self.scaler.unscale_(opt)
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 336, in unscale_
optimizer_state["found_inf_per_device"] = self._unscale_grads_(
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 277, in _unscale_grads_
torch._amp_foreach_non_finite_check_and_unscale_(
RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'
0% 0/20 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1057, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 673, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'axolotl.cli.train', 'config.yaml']' returned non-zero exit status 1.
Also you need to remove mlflow reporting from the config otherwise it will
complain as it is not installed.
—
Reply to this email directly, view it on GitHub
<#40 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ATL5EGX4MMOEDGVKGW6OWQ3ZAX5QTAVCNFSM6AAAAABCMCUZT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJUG4YTIMZVGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi, I've uploaded colab that follows your article.
The Merge operation is missing, because I didn't know if you were interested.
Downloads.zip
The text was updated successfully, but these errors were encountered: