RuntimeError: CUDA out of memory，continuous training？ #319

goldwater668 · 2023-03-14T01:46:12Z

I have trained 3000 pairs of data, and want to add another 2000 pairs to continue training, using the following command:
python train.py --name comics --dataroot ./datasets/comics3Kto5K --loadSize 512 --label_nc 0 --no_instance --netG local --load_pretrain checkpoints0310/comics/

But the error is as follows:
#RuntimeError: CUDA out of memory. Tried to allocate 98.00 MiB (GPU 0; 11.76 GiB total capacity; 8.86 GiB already allocated; 113.56 MiB free; 8.91 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
What is going on?

The text was updated successfully, but these errors were encountered:

takuyaliu · 2023-03-27T01:29:47Z

Restart

goldwater668 · 2023-03-27T01:40:50Z

@takuyaliu If training is interrupted, can't I continue training from the breakpoint?

takuyaliu · 2023-04-03T12:40:45Z

Of course you can train it from the latest breakpoint, you can find it in base options and train options.

goldwater668 · 2023-04-04T01:30:56Z

@takuyaliu
How should set the parameters, thank you for your reply!

takuyaliu · 2023-04-04T02:49:11Z

@takuyaliu How should set the parameters, thank you for your reply!

You can find it in 'options/train_options.py'

for training

    self.parser.add_argument('--continue_train', action='store_true', help='continue training: load the latest model')
    self.parser.add_argument('--which_epoch', type=str, default='latest', help='which epoch to load? set to latest to use latest cached model')

try to add '--continue_train --which_epoch latest' after your training command.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA out of memory，continuous training？ #319

RuntimeError: CUDA out of memory，continuous training？ #319

goldwater668 commented Mar 14, 2023

takuyaliu commented Mar 27, 2023

goldwater668 commented Mar 27, 2023

takuyaliu commented Apr 3, 2023

goldwater668 commented Apr 4, 2023

takuyaliu commented Apr 4, 2023

RuntimeError: CUDA out of memory，continuous training？ #319

RuntimeError: CUDA out of memory，continuous training？ #319

Comments

goldwater668 commented Mar 14, 2023

takuyaliu commented Mar 27, 2023

goldwater668 commented Mar 27, 2023

takuyaliu commented Apr 3, 2023

goldwater668 commented Apr 4, 2023

takuyaliu commented Apr 4, 2023

for training