Adafactor fails to run on a custom (rfs) resnet12 (with MAML) #405

brando90 · 2021-12-03T17:37:00Z

I was trying adafactor but I get the following issues:

args.scheduler=None
--------------------- META-TRAIN ------------------------
Starting training!
Traceback (most recent call last):
  File "/home/miranda9/automl-meta-learning/automl-proj-src/experiments/meta_learning/main_metalearning.py", line 441, in <module>
    main_resume_from_checkpoint(args)
  File "/home/miranda9/automl-meta-learning/automl-proj-src/experiments/meta_learning/main_metalearning.py", line 403, in main_resume_from_checkpoint
    run_training(args)
  File "/home/miranda9/automl-meta-learning/automl-proj-src/experiments/meta_learning/main_metalearning.py", line 413, in run_training
    meta_train_fixed_iterations(args)
  File "/home/miranda9/automl-meta-learning/automl-proj-src/meta_learning/training/meta_training.py", line 233, in meta_train_fixed_iterations
    args.outer_opt.step()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/torch_optimizer/adafactor.py", line 191, in step
    self._approx_sq_grad(
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/torch_optimizer/adafactor.py", line 116, in _approx_sq_grad
    (exp_avg_sq_row / exp_avg_sq_row.mean(dim=-1))
RuntimeError: The size of tensor a (3) must match the size of tensor b (64) at non-singleton dimension 1

with the pytorch default adam training runs so why does this one fail?

ionutmodo · 2023-07-10T09:40:49Z

are there any updates on this? The issue is still present

ionutmodo · 2023-07-10T13:15:17Z

I had a look at this error which I also faced when training a ResNet-50 model. I got a similar error as @brando90, except that the dimensions of my tensors were different. Please read further in order to understand how I managed to fix this issue.

First of all, the exception is raised from here, where the tensor exp_avg_sq_row is divided by the mean over the last dimension. In my case, exp_avg_sq_row has size [64, 3, 7]. When computing the mean over the last dimension, the result exp_avg_sq_row.mean(dim=-1) will have size [64, 3] and the dimension mismatch for this division operation raises the RuntimeError.

The solution is to unsqueeze the mean tensor such that instead of doing (exp_avg_sq_row / exp_avg_sq_row.mean(dim=-1)), we should do (exp_avg_sq_row / exp_avg_sq_row.mean(dim=-1).unsqueeze(-1)).

Xynonners · 2023-07-28T05:39:00Z

still happens, someone make a pull request?

pavelbatyr mentioned this issue Oct 7, 2022

problem with Adafactor #469

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adafactor fails to run on a custom (rfs) resnet12 (with MAML) #405

Adafactor fails to run on a custom (rfs) resnet12 (with MAML) #405

brando90 commented Dec 3, 2021 •

edited

ionutmodo commented Jul 10, 2023

ionutmodo commented Jul 10, 2023 •

edited

Xynonners commented Jul 28, 2023

Adafactor fails to run on a custom (rfs) resnet12 (with MAML) #405

Adafactor fails to run on a custom (rfs) resnet12 (with MAML) #405

Comments

brando90 commented Dec 3, 2021 • edited

ionutmodo commented Jul 10, 2023

ionutmodo commented Jul 10, 2023 • edited

Xynonners commented Jul 28, 2023

brando90 commented Dec 3, 2021 •

edited

ionutmodo commented Jul 10, 2023 •

edited