New colab : Fine-tune LLMs with Axolotl End-to-end guide to the state-of-the-art tool for fine-tuning #40

g-i-o-r-g-i-o · 2024-01-26T12:39:00Z

Hi, I've uploaded colab that follows your article.

The Merge operation is missing, because I didn't know if you were interested.

mlabonne · 2024-01-27T22:21:41Z

Thanks, I completed it and added it to the LLM course, see https://colab.research.google.com/drive/1Xu0BrCB7IShwSWKVcfAfhehwjDrDMH5m?usp=sharing

kukedlc87 · 2024-02-15T14:47:07Z

If this line of code:

!pip install -qqq -e '.[flash-attn,deepspeed]' --progress-bar off

gives you an error, you should downgrade Torch to version 2.1.1:

!pip install torch==2.1.1

mlabonne · 2024-02-18T22:47:23Z

Thanks @kukedlc87 I added it

dzlab · 2024-05-05T09:44:20Z

@mlabonne the Fine_tune_LLMs_with_Axolotl.ipynb does not work.
Those are the dependencies

****************************************
**** Axolotl Dependency Versions *****
  accelerate: 0.28.0         
        peft: 0.10.0         
transformers: 4.40.0.dev0    
         trl: 0.8.5          
       torch: 2.2.1+cu121    
bitsandbytes: 0.43.0         
****************************************

Training is failing on Colab T4 with RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'. This is the full stacktrace

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/content/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/content/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/content/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/content/axolotl/src/axolotl/train.py", line 170, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1837, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2227, in _inner_training_loop
    _grad_norm = self.accelerator.clip_grad_norm_(
  File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2145, in clip_grad_norm_
    self.unscale_gradients()
  File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2095, in unscale_gradients
    self.scaler.unscale_(opt)
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 336, in unscale_
    optimizer_state["found_inf_per_device"] = self._unscale_grads_(
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 277, in _unscale_grads_
    torch._amp_foreach_non_finite_check_and_unscale_(
RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'
  0% 0/20 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 46, in main
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1057, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 673, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'axolotl.cli.train', 'config.yaml']' returned non-zero exit status 1.

Also you need to remove mlflow reporting from the config otherwise it will complain as it is not installed.

mlabonne · 2024-05-05T21:23:47Z

Thanks I upgraded PyTorch's version and removed mlflow.

…

On Sun, May 5, 2024 at 11:44 AM bachr ***@***.***> wrote: @mlabonne <https://github.com/mlabonne> the Fine_tune_LLMs_with_Axolotl.ipynb <https://colab.research.google.com/drive/1Xu0BrCB7IShwSWKVcfAfhehwjDrDMH5m?usp=sharing> does not work. Those are the dependencies **************************************** **** Axolotl Dependency Versions ***** accelerate: 0.28.0 peft: 0.10.0 transformers: 4.40.0.dev0 trl: 0.8.5 torch: 2.2.1+cu121 bitsandbytes: 0.43.0 **************************************** Training is failing on Colab T4 with RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'. This is the full stacktrace Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/content/axolotl/src/axolotl/cli/train.py", line 59, in <module> fire.Fire(do_cli) File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/content/axolotl/src/axolotl/cli/train.py", line 35, in do_cli return do_train(parsed_cfg, parsed_cli_args) File "/content/axolotl/src/axolotl/cli/train.py", line 55, in do_train return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta) File "/content/axolotl/src/axolotl/train.py", line 170, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1837, in train return inner_training_loop( File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2227, in _inner_training_loop _grad_norm = self.accelerator.clip_grad_norm_( File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2145, in clip_grad_norm_ self.unscale_gradients() File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2095, in unscale_gradients self.scaler.unscale_(opt) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 336, in unscale_ optimizer_state["found_inf_per_device"] = self._unscale_grads_( File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 277, in _unscale_grads_ torch._amp_foreach_non_finite_check_and_unscale_( RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16' 0% 0/20 [00:02<?, ?it/s] Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 46, in main args.func(args) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1057, in launch_command simple_launcher(args) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 673, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'axolotl.cli.train', 'config.yaml']' returned non-zero exit status 1. Also you need to remove mlflow reporting from the config otherwise it will complain as it is not installed. — Reply to this email directly, view it on GitHub <#40 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ATL5EGX4MMOEDGVKGW6OWQ3ZAX5QTAVCNFSM6AAAAABCMCUZT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJUG4YTIMZVGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Openegg15 · 2024-05-25T02:58:36Z

This is a follow-up to #1822 and #100

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New colab : Fine-tune LLMs with Axolotl End-to-end guide to the state-of-the-art tool for fine-tuning #40

New colab : Fine-tune LLMs with Axolotl End-to-end guide to the state-of-the-art tool for fine-tuning #40

g-i-o-r-g-i-o commented Jan 26, 2024 •

edited

mlabonne commented Jan 27, 2024

kukedlc87 commented Feb 15, 2024 •

edited

mlabonne commented Feb 18, 2024

dzlab commented May 5, 2024

mlabonne commented May 5, 2024 via email

Openegg15 commented May 25, 2024

New colab : Fine-tune LLMs with Axolotl End-to-end guide to the state-of-the-art tool for fine-tuning #40

New colab : Fine-tune LLMs with Axolotl End-to-end guide to the state-of-the-art tool for fine-tuning #40

Comments

g-i-o-r-g-i-o commented Jan 26, 2024 • edited

mlabonne commented Jan 27, 2024

kukedlc87 commented Feb 15, 2024 • edited

mlabonne commented Feb 18, 2024

dzlab commented May 5, 2024

mlabonne commented May 5, 2024 via email

Openegg15 commented May 25, 2024

g-i-o-r-g-i-o commented Jan 26, 2024 •

edited

kukedlc87 commented Feb 15, 2024 •

edited