Accelerate 0.30.0 Breaks FSDP QLora #2761

mallorbc · 2024-05-10T12:21:02Z

System Info

See below a pip list output that does not work:

Package                  Version
------------------------ ---------------
accelerate               0.30.0
aiohttp                  3.9.5
aiosignal                1.3.1
annotated-types          0.6.0
async-timeout            4.0.3
attrs                    23.2.0
bitsandbytes             0.43.1
certifi                  2024.2.2
charset-normalizer       3.3.2
click                    8.1.7
datasets                 2.19.1
deepspeed                0.14.2+5f631abc
dill                     0.3.8
docker-pycreds           0.4.0
docstring_parser         0.16
einops                   0.8.0
eval_type_backport       0.2.0
exceptiongroup           1.2.1
filelock                 3.14.0
flash-attn               2.5.8
frozenlist               1.4.1
fsspec                   2024.3.1
gitdb                    4.0.11
GitPython                3.1.43
hf_transfer              0.1.6
hjson                    3.1.0
huggingface-hub          0.23.0
idna                     3.7
iniconfig                2.0.0
Jinja2                   3.1.4
markdown-it-py           3.0.0
MarkupSafe               2.1.5
mdurl                    0.1.2
mpmath                   1.3.0
multidict                6.0.5
multiprocess             0.70.16
networkx                 3.1
ninja                    1.11.1.1
numpy                    1.24.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.4.127
nvidia-nvtx-cu12         12.1.105
packaging                24.0
pandas                   2.0.3
peft                     0.10.0
pillow                   10.3.0
pip                      24.0
platformdirs             4.2.1
pluggy                   1.5.0
protobuf                 3.20.1
psutil                   5.9.8
py-cpuinfo               9.0.0
pyarrow                  16.0.0
pyarrow-hotfix           0.6
pydantic                 2.7.1
pydantic_core            2.18.2
Pygments                 2.18.0
pynvml                   11.5.0
pytest                   8.2.0
python-dateutil          2.9.0.post0
pytz                     2024.1
PyYAML                   6.0.1
regex                    2024.5.10
requests                 2.31.0
rich                     13.7.1
safetensors              0.4.3
scipy                    1.10.1
sentencepiece            0.2.0
sentry-sdk               2.1.1
setproctitle             1.3.3
setuptools               69.5.1
shtab                    1.7.1
six                      1.16.0
smmap                    5.0.1
sympy                    1.12
text-generation          0.7.0
tokenizers               0.19.1
tomli                    2.0.1
torch                    2.3.0
torchaudio               2.3.0
torchvision              0.18.0
tqdm                     4.66.4
transformers             4.40.2
triton                   2.3.0
trl                      0.8.6
typing_extensions        4.11.0
tyro                     0.8.4
tzdata                   2024.1
urllib3                  2.2.1
wandb                    0.17.0
wheel                    0.43.0
xxhash                   3.4.1
yarl                     1.9.4

Changing accelerate to accelerate<=0.29.3:
Package                  Version
------------------------ ---------------
accelerate               0.29.3
aiohttp                  3.9.5
aiosignal                1.3.1
annotated-types          0.6.0
async-timeout            4.0.3
attrs                    23.2.0
bitsandbytes             0.43.1
certifi                  2024.2.2
charset-normalizer       3.3.2
click                    8.1.7
datasets                 2.19.1
deepspeed                0.14.2+5f631abc
dill                     0.3.8
docker-pycreds           0.4.0
docstring_parser         0.16
einops                   0.8.0
eval_type_backport       0.2.0
exceptiongroup           1.2.1
filelock                 3.14.0
flash-attn               2.5.8
frozenlist               1.4.1
fsspec                   2024.3.1
gitdb                    4.0.11
GitPython                3.1.43
hf_transfer              0.1.6
hjson                    3.1.0
huggingface-hub          0.23.0
idna                     3.7
iniconfig                2.0.0
Jinja2                   3.1.4
markdown-it-py           3.0.0
MarkupSafe               2.1.5
mdurl                    0.1.2
mpmath                   1.3.0
multidict                6.0.5
multiprocess             0.70.16
networkx                 3.1
ninja                    1.11.1.1
numpy                    1.24.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.4.127
nvidia-nvtx-cu12         12.1.105
packaging                24.0
pandas                   2.0.3
peft                     0.10.0
pillow                   10.3.0
pip                      24.0
platformdirs             4.2.1
pluggy                   1.5.0
protobuf                 3.20.1
psutil                   5.9.8
py-cpuinfo               9.0.0
pyarrow                  16.0.0
pyarrow-hotfix           0.6
pydantic                 2.7.1
pydantic_core            2.18.2
Pygments                 2.18.0
pynvml                   11.5.0
pytest                   8.2.0
python-dateutil          2.9.0.post0
pytz                     2024.1
PyYAML                   6.0.1
regex                    2024.5.10
requests                 2.31.0
rich                     13.7.1
safetensors              0.4.3
scipy                    1.10.1
sentencepiece            0.2.0
sentry-sdk               2.1.1
setproctitle             1.3.3
setuptools               69.5.1
shtab                    1.7.1
six                      1.16.0
smmap                    5.0.1
sympy                    1.12
text-generation          0.7.0
tokenizers               0.19.1
tomli                    2.0.1
torch                    2.3.0
torchaudio               2.3.0
torchvision              0.18.0
tqdm                     4.66.4
transformers             4.40.2
triton                   2.3.0
trl                      0.8.6
typing_extensions        4.11.0
tyro                     0.8.4
tzdata                   2024.1
urllib3                  2.2.1
wandb                    0.17.0
wheel                    0.43.0
xxhash                   3.4.1
yarl                     1.9.4

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

I am using code based on the code here:
https://github.com/mallorbc/Finetune_LLMs

Else, the basic steps are the following:

Install the pip packages seen above, namely:
pip install "accelerate<=0.29.3"
pip install transformers accelerate peft bitsandbytes trl
Use a QLora FSDP program
Notice how errors occur with 0.3.0 but not 0.29.3

See an error like the following for 0.30.0:

[rank0]: Traceback (most recent call last):
[rank0]:   File "trl_finetune.py", line 387, in <module>
[rank0]:     trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)
[rank0]:   File "/usr/local/lib/python3.8/dist-packages/trl/trainer/sft_trainer.py", line 361, in train
[rank0]:     output = super().train(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 1859, in train
[rank0]:     return inner_training_loop(
[rank0]:   File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 2001, in _inner_training_loop
[rank0]:     self._fsdp_qlora_plugin_updates()
[rank0]:   File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 4425, in _fsdp_qlora_plugin_updates
[rank0]:     fsdp_plugin.auto_wrap_policy = fsdp_auto_wrap_policy(self.model)
[rank0]:   File "/usr/local/lib/python3.8/dist-packages/peft/utils/other.py", line 396, in fsdp_auto_wrap_policy
[rank0]:     transformer_cls = FullyShardedDataParallelPlugin.get_module_class_from_name(model, layer_class)
[rank0]: AttributeError: type object 'FullyShardedDataParallelPlugin' has no attribute 'get_module_class_from_name'
[rank1]: Traceback (most recent call last):
[rank1]:   File "trl_finetune.py", line 387, in <module>
[rank1]:     trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/trl/trainer/sft_trainer.py", line 361, in train
[rank1]:     output = super().train(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 1859, in train
[rank1]:     return inner_training_loop(
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 2001, in _inner_training_loop
[rank1]:     self._fsdp_qlora_plugin_updates()
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 4425, in _fsdp_qlora_plugin_updates
[rank1]:     fsdp_plugin.auto_wrap_policy = fsdp_auto_wrap_policy(self.model)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/peft/utils/other.py", line 396, in fsdp_auto_wrap_policy
[rank1]:     transformer_cls = FullyShardedDataParallelPlugin.get_module_class_from_name(model, layer_class)
[rank1]: AttributeError: type object 'FullyShardedDataParallelPlugin' has no attribute 'get_module_class_from_name'
E0510 12:16:25.853937 140644343273280 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 140) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 46, in main
    args.func(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1069, in launch_command
    multi_gpu_launcher(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 718, in multi_gpu_launcher
    distrib_run.run(args)
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 870, in run
    elastic_launch(
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
trl_finetune.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2024-05-10_12:16:25
  host      : f61090d2a6fd
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 141)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-05-10_12:16:25
  host      : f61090d2a6fd
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 140)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Expected behavior

I expect training to occur without issues. This occurs when I use accelerate 0.29.3

The text was updated successfully, but these errors were encountered:

muellerzr · 2024-05-10T12:25:57Z

cc @younesbelkada @pacman100

BenjaminBossan · 2024-05-13T10:41:17Z

@mallorbc Could you try installing PEFT from main and check if the error persists?

mallorbc · 2024-05-16T16:15:12Z

So use latest accelerate and install peft from main?

I will do the following:
pip install transformers bitsandbytes trl accelerate
pip install git+https://github.com/huggingface/peft.git

I will let you know

mallorbc · 2024-05-16T20:45:22Z

I did the above setup. Here is my pip list:
Package Version

accelerate 0.30.1
aiohttp 3.9.5
aiosignal 1.3.1
annotated-types 0.6.0
async-timeout 4.0.3
attrs 23.2.0
bitsandbytes 0.43.1
certifi 2024.2.2
charset-normalizer 3.3.2
click 8.1.7
datasets 2.19.1
deepspeed 0.14.2+5f631abc
dill 0.3.8
docker-pycreds 0.4.0
docstring_parser 0.16
einops 0.8.0
eval_type_backport 0.2.0
exceptiongroup 1.2.1
filelock 3.14.0
flash-attn 2.5.8
frozenlist 1.4.1
fsspec 2024.3.1
gitdb 4.0.11
GitPython 3.1.43
hf_transfer 0.1.6
hjson 3.1.0
huggingface-hub 0.23.0
idna 3.7
iniconfig 2.0.0
Jinja2 3.1.4
markdown-it-py 3.0.0
MarkupSafe 2.1.5
mdurl 0.1.2
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
networkx 3.1
ninja 1.11.1.1
numpy 1.24.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.1.105
packaging 24.0
pandas 2.0.3
peft 0.11.1.dev0
pillow 10.3.0
pip 24.0
platformdirs 4.2.2
pluggy 1.5.0
protobuf 3.20.1
psutil 5.9.8
py-cpuinfo 9.0.0
pyarrow 16.1.0
pyarrow-hotfix 0.6
pydantic 2.7.1
pydantic_core 2.18.2
Pygments 2.18.0
pynvml 11.5.0
pytest 8.2.0
python-dateutil 2.9.0.post0
pytz 2024.1
PyYAML 6.0.1
regex 2024.5.15
requests 2.31.0
rich 13.7.1
safetensors 0.4.3
scipy 1.10.1
sentencepiece 0.2.0
sentry-sdk 2.2.0
setproctitle 1.3.3
setuptools 69.5.1
shtab 1.7.1
six 1.16.0
smmap 5.0.1
sympy 1.12
text-generation 0.7.0
tokenizers 0.19.1
tomli 2.0.1
torch 2.3.0
torchaudio 2.3.0
torchvision 0.18.0
tqdm 4.66.4
transformers 4.40.2
triton 2.3.0
trl 0.8.6
typing_extensions 4.11.0
tyro 0.8.4
tzdata 2024.1
urllib3 2.2.1
wandb 0.17.0
wheel 0.43.0
xxhash 3.4.1
yarl 1.9.4

I can confirm that this lead to successful fine-tuning with QLora with FSDP. However, QDora seems to be broken.

When I try doing FSDP QDora, I get the following issue:
[rank0]: Traceback (most recent call last):
[rank0]: File "trl_finetune.py", line 399, in
[rank0]: trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/trl/trainer/sft_trainer.py", line 361, in train
[rank0]: output = super().train(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 1859, in train
[rank0]: return inner_training_loop(
[rank0]: File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 2002, in _inner_training_loop
[rank0]: self.model = self.accelerator.prepare(self.model)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1292, in prepare
[rank0]: result = tuple(
[rank0]: File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1293, in
[rank0]: self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1169, in _prepare_one
[rank0]: return self.prepare_model(obj, device_placement=device_placement)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1459, in prepare_model
[rank0]: model = FSDP(model, **kwargs)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 485, in init
[rank0]: _auto_wrap(
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/_wrap_utils.py", line 101, in _auto_wrap
[rank0]: _recursive_wrap(**recursive_wrap_kwargs, **root_kwargs) # type: ignore[arg-type]
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/wrap.py", line 543, in _recursive_wrap
[rank0]: wrapped_child, num_wrapped_params = _recursive_wrap(
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/wrap.py", line 543, in _recursive_wrap
[rank0]: wrapped_child, num_wrapped_params = _recursive_wrap(
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/wrap.py", line 543, in _recursive_wrap
[rank0]: wrapped_child, num_wrapped_params = _recursive_wrap(
[rank0]: [Previous line repeated 2 more times]
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/wrap.py", line 561, in _recursive_wrap
[rank0]: return _wrap(module, wrapper_cls, **kwargs), nonwrapped_numel
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/wrap.py", line 490, in _wrap
[rank0]: return wrapper_cls(module, **kwargs)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 511, in init
[rank0]: _init_param_handle_from_module(
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/_init_utils.py", line 598, in _init_param_handle_from_module
[rank0]: _init_param_handle_from_params(state, managed_params, fully_sharded_module)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/_init_utils.py", line 610, in _init_param_handle_from_params
[rank0]: handle = FlatParamHandle(
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/_flat_param.py", line 582, in init
[rank0]: self._init_flat_param_and_metadata(
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/_flat_param.py", line 632, in _init_flat_param_and_metadata
[rank0]: ) = self._validate_tensors_to_flatten(params)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/_flat_param.py", line 768, in _validate_tensors_to_flatten
[rank0]: raise ValueError("Cannot flatten integer dtype tensors")
[rank0]: ValueError: Cannot flatten integer dtype tensors

jaywongs · 2024-05-17T09:17:12Z

ith QLora with FS

I used the exactly version you mentioned ,and with fsdp+qlora, i got the same "ValueError: Cannot flatten integer dtype tensors"

BenjaminBossan · 2024-05-31T15:04:23Z

For QLoRA training with FSDP, please check the updated bitsandbytes docs.

As for QDoRA: Training with FSDP should be fixed in huggingface/peft#1806. If you install from the latest PEFT main, it should thus work. Please also check the PR description on how this was tested.

mallorbc mentioned this issue May 17, 2024

FSDP Dora/QDora Broken huggingface/peft#1737

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate 0.30.0 Breaks FSDP QLora #2761

Accelerate 0.30.0 Breaks FSDP QLora #2761

mallorbc commented May 10, 2024 •

edited by BenjaminBossan

muellerzr commented May 10, 2024

BenjaminBossan commented May 13, 2024

mallorbc commented May 16, 2024

mallorbc commented May 16, 2024

jaywongs commented May 17, 2024

BenjaminBossan commented May 31, 2024

Accelerate 0.30.0 Breaks FSDP QLora #2761

Accelerate 0.30.0 Breaks FSDP QLora #2761

Comments

mallorbc commented May 10, 2024 • edited by BenjaminBossan

System Info

Information

Tasks

Reproduction

Expected behavior

muellerzr commented May 10, 2024

BenjaminBossan commented May 13, 2024

mallorbc commented May 16, 2024

mallorbc commented May 16, 2024

jaywongs commented May 17, 2024

BenjaminBossan commented May 31, 2024

mallorbc commented May 10, 2024 •

edited by BenjaminBossan