Is T5 3B training properly parallelizing? #4559

rguan1 · 2022-05-20T13:40:57Z

I am trying to train a T5 model on empathetic dialogues. I am running into cuda OOM errors when training my model with the following command. When training the BlenderBot 3B model, I ran into this issue until I parallelized my training across two GPUs. However, it seems that parallelizing T5 3B doesn't resolve the issue. Also, I've reduced the batchsize to 1 and the truncate to 128 (truncate at 64 also doesn't work). Any suggestions to resolve the issue?

Command

parlai train_model -t empathetic_dialogues -m hugging_face/t5 --t5-model-arch t5-3b --t5-model-parallel True --fp16 True --optimizer adam --batchsize 1 --skip-generation True -vmt ppl -tr 64 --model-file ./chatbot_models/3B/testdebugT5/model --tstep 100

Error message

/home/rg4312/ParlAI/parlai/utils/fp16.py:85: FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_grad_norm_; continuing anyway. Note that the default behavior will change in a future release to error out if a non-finite total norm is encountered. At that point, setting error_if_nonfinite=false will be required to retain the old behavior.
  return torch.nn.utils.clip_grad_norm_(params, max_norm)
09:27:14 | Ran out of memory, skipping batch. if this happens frequently, decrease batchsize or truncate the inputs to the model.
Traceback (most recent call last):
  File "/home/rg4312/ParlAI/parlai/core/torch_generator_agent.py", line 603, in _fake_forward_backward_pass
    loss = 0 * self.compute_loss(self._dummy_batch)
  File "/home/rg4312/ParlAI/parlai/core/torch_generator_agent.py", line 693, in compute_loss
    model_output = self.model(*self._model_input(batch), ys=batch.label_vec)
  File "/ext3/miniconda3/envs/chatbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/rg4312/ParlAI/parlai/core/torch_generator_agent.py", line 312, in forward
    scores, preds = self.decode_forced(encoder_states, ys)
  File "/home/rg4312/ParlAI/parlai/core/torch_generator_agent.py", line 181, in decode_forced
    latent, _ = self.decoder(inputs, encoder_states)
  File "/ext3/miniconda3/envs/chatbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/rg4312/ParlAI/parlai/agents/hugging_face/t5.py", line 59, in wrap
    ret = func(*args, **kwargs)
  File "/home/rg4312/ParlAI/parlai/agents/hugging_face/t5.py", line 274, in forward
    outputs = self.stack(
  File "/ext3/miniconda3/envs/chatbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/ext3/miniconda3/envs/chatbot/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 985, in forward
    layer_outputs = layer_module(
  File "/ext3/miniconda3/envs/chatbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/ext3/miniconda3/envs/chatbot/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 663, in forward
    cross_attention_outputs = self.layer[1](
  File "/ext3/miniconda3/envs/chatbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/ext3/miniconda3/envs/chatbot/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 578, in forward
    attention_output = self.EncDecAttention(
  File "/ext3/miniconda3/envs/chatbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/ext3/miniconda3/envs/chatbot/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 470, in forward
    query_states = shape(self.q(hidden_states))  # (batch_size, n_heads, seq_length, dim_per_head)
  File "/ext3/miniconda3/envs/chatbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/ext3/miniconda3/envs/chatbot/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 96, in forward
    return F.linear(input, self.weight, self.bias)
  File "/ext3/miniconda3/envs/chatbot/lib/python3.8/site-packages/torch/nn/functional.py", line 1847, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 44.49 GiB total capacity; 43.46 GiB already allocated; 2.00 MiB free; 43.48 GiB reserved in total by PyTorch)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/ext3/miniconda3/envs/chatbot/bin/parlai", line 33, in <module>
    sys.exit(load_entry_point('parlai', 'console_scripts', 'parlai')())
  File "/home/rg4312/ParlAI/parlai/__main__.py", line 14, in main
    superscript_main()
  File "/home/rg4312/ParlAI/parlai/core/script.py", line 325, in superscript_main
    return SCRIPT_REGISTRY[cmd].klass._run_from_parser_and_opt(opt, parser)
  File "/home/rg4312/ParlAI/parlai/core/script.py", line 108, in _run_from_parser_and_opt
    return script.run()
  File "/home/rg4312/ParlAI/parlai/scripts/train_model.py", line 998, in run
    return self.train_loop.train()
  File "/home/rg4312/ParlAI/parlai/scripts/train_model.py", line 950, in train
    for _train_log in self.train_steps():
  File "/home/rg4312/ParlAI/parlai/scripts/train_model.py", line 857, in train_steps
    world.parley()
  File "/home/rg4312/ParlAI/parlai/core/worlds.py", line 370, in parley
    acts[1] = agents[1].act()
  File "/home/rg4312/ParlAI/parlai/core/torch_agent.py", line 2143, in act
    response = self.batch_act([self.observation])[0]
  File "/home/rg4312/ParlAI/parlai/core/torch_agent.py", line 2234, in batch_act
    output = self.train_step(batch)
  File "/home/rg4312/ParlAI/parlai/core/torch_generator_agent.py", line 759, in train_step
    self._fake_forward_backward_pass()
  File "/home/rg4312/ParlAI/parlai/core/torch_generator_agent.py", line 614, in _fake_forward_backward_pass
    raise RuntimeError(m)
RuntimeError: CUDA OOM: Lower batch size (-bs) from 1 or lower  max sequence length (-tr) from 128

The text was updated successfully, but these errors were encountered:

klshuster · 2022-05-26T22:10:57Z

from a cursory examination, it looks like this might be failing because we can't fit the model activations in the first gpu spot. i'll need to investigate a bit more

github-actions · 2022-06-26T00:11:18Z

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.

klshuster self-assigned this May 26, 2022

github-actions bot added the stale label Jun 26, 2022

klshuster added donotreap Avoid automatically marking as stale. and removed stale labels Jun 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is T5 3B training properly parallelizing? #4559

Is T5 3B training properly parallelizing? #4559

rguan1 commented May 20, 2022

klshuster commented May 26, 2022

github-actions bot commented Jun 26, 2022

Is T5 3B training properly parallelizing? #4559

Is T5 3B training properly parallelizing? #4559

Comments

rguan1 commented May 20, 2022

Command

Error message

klshuster commented May 26, 2022

github-actions bot commented Jun 26, 2022