Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError during default execution #3

Open
AlexanderGri opened this issue Dec 16, 2017 · 11 comments · May be fixed by #5
Open

RuntimeError during default execution #3

AlexanderGri opened this issue Dec 16, 2017 · 11 comments · May be fixed by #5

Comments

@AlexanderGri
Copy link

AlexanderGri commented Dec 16, 2017

Hello, thank you for your implemenation!

I've just tried to run default experiment with

python main.py --no-cuda --epochs 1

and run into the following problem

/opt/conda/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6 return f(*args, **kwds)
Prepare files
Define model
        Statistics
        Create model
Optimizer
Logger
=> no best model found at './checkpoint/qm9/mpnn/model_best.pth'
Check cuda
Traceback (most recent call last):
  File "main.py", line 321, in <module>
    main()
  File "main.py", line 182, in main
    train(train_loader, model, criterion, optimizer, epoch, evaluation, logger)
  File "main.py", line 242, in train
    output = model(g, h, e)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 319, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/grishin/nmp_qc/models/MPNN.py", line 78, in forward
    m = self.m[0].forward(h[t], h_aux, e_aux)
  File "/data/grishin/nmp_qc/MessageFunction.py", line 43, in forward
    return self.m_function(h_v, h_w, e_vw, args)
  File "/data/grishin/nmp_qc/MessageFunction.py", line 175, in m_mpnn
    h_w_rows = h_w[..., None].expand(h_w.size(0), h_v.size(1), h_w.size(1)).contiguous()
RuntimeError: The expanded size of the tensor (25) must match the existing size (73) at non-singleton dimension 1

Am i doing something wrong? Thank you in advance.

@priba
Copy link
Owner

priba commented Feb 1, 2018

Hi,

Sorry for the big delay on the answer, in my opinion the errors you reported come from the Pytorch version. I've got similar errors changing the pytorch release due to changes on the "sum" behaviour. It was in another code I am working on.

https://github.com/pytorch/pytorch/releases
"All reduce functions such as sum and mean now default to squeezing the reduced dimension."

I suggest to add keepdim=False in sum operations for fast and easy solve of this problem.

After a few weeks, I will try to fix the code to new pytorch versions.

@ay27 ay27 linked a pull request Feb 1, 2018 that will close this issue
@ay27
Copy link

ay27 commented Feb 1, 2018

I tried to fix the problem and made some improvements, but not confident with the correctness, someone may verify it.

@josejimenezluna
Copy link

Hello, @priba

To make things easier, which version of pytorch are we supposed to be running?

@adamxyang
Copy link

Hello, @priba

Thanks for the implementation! I encountered the same issue here. I experimented with pytorch versions 0.2.0, 0.3.0 and 1.0.0, and I've also added keepdim=False to all sum operations in datasets.utils.py and models.MPNN.py, but none of them worked.

(rdkit) Adams-MacBook-Pro-4:mpnn iron4dam$ python main.py --no-cuda
Prepare files
Define model
	Statistics
	Create model
Optimizer
Logger
=> no best model found at './checkpoint/qm9/mpnn/model_best.pth'
Check cuda
Traceback (most recent call last):
  File "main.py", line 320, in <module>
    main()
  File "main.py", line 182, in main
    train(train_loader, model, criterion, optimizer, epoch, evaluation, logger)
  File "main.py", line 241, in train
    output = model(g, h, e)
  File "/Users/iron4dam/anaconda3/envs/rdkit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/iron4dam/Google_Drive/Part_C/Dissertation/dissertation_code/mpnn/models/MPNN.py", line 78, in forward
    m = self.m[0].forward(h[t], h_aux, e_aux)
  File "/Users/iron4dam/Google_Drive/Part_C/Dissertation/dissertation_code/mpnn/MessageFunction.py", line 43, in forward
    return self.m_function(h_v, h_w, e_vw, args)
  File "/Users/iron4dam/Google_Drive/Part_C/Dissertation/dissertation_code/mpnn/MessageFunction.py", line 174, in m_mpnn
    h_w_rows = h_w[..., None].expand(h_w.size(0), h_v.size(1), h_w.size(1)).contiguous()
RuntimeError: The expanded size of the tensor (25) must match the existing size (73) at non-singleton dimension 1. at /Users/soumith/minicondabuild3/conda-bld/pytorch_1512381214802/work/torch/lib/TH/generic/THTensor.c:309

@rmrmg
Copy link

rmrmg commented Mar 3, 2019

@ay27 I've applied your patch and have another problem:

(nmpqc) rmrmg@kolos:/chematica/pka/nmpqc/nmp_qc$ LD_PRELOAD=$CONDA_PREFIX/lib/libstdc++.so python ./main.py --no-cuda
loaeed
Prepare files
Define model
Statistics
Create model
Optimizer
Logger
=> no best model found at './checkpoint/qm9/mpnn/model_best.pth'
Check cuda
Traceback (most recent call last):
File "./main.py", line 330, in
main()
File "./main.py", line 191, in main
train(train_loader, model, criterion, optimizer, epoch, evaluation, logger)
File "./main.py", line 254, in train
losses.update(train_loss.data[0], g.size(0))
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

@wmmxk
Copy link

wmmxk commented Jun 9, 2019

I run into the same error:
h_w_rows = h_w[..., None].expand(h_w.size(0), h_v.size(1), h_w.size(1)).contiguous()
RuntimeError: The expanded size of the tensor (24) must match the existing size (73) at non-singleton dimension 1

So if it is due to version update, could I know what version you are using? (I am using pytorch 0.4.1)

@priba
Copy link
Owner

priba commented Jun 10, 2019

At that time I was using Pytorch 0.3.0

@njwm
Copy link

njwm commented Aug 8, 2021

I made a small change like this:
h_w_rows = h_w[..., None].expand(h_w.size(0), h_w.size(1),h_v.size(1), ).contiguous()
It seems to work. But i am not sure about the results.

@sthakurr
Copy link

I made a small change like this: h_w_rows = h_w[..., None].expand(h_w.size(0), h_w.size(1),h_v.size(1), ).contiguous() It seems to work. But i am not sure about the results.

@njwm I did the same in order to get past that error and it worked (even though another similar error came regarding a .sum operation). But can you please verify if it affected the results?

@njwm
Copy link

njwm commented Apr 22, 2022

Perhaps it is better like this:
h_w_rows = h_w[:, None,:].expand(h_w.size(0), h_v.size(1), h_w.size(1)).contiguous()

@njwm
Copy link

njwm commented Apr 22, 2022

I made a small change like this: h_w_rows = h_w[..., None].expand(h_w.size(0), h_w.size(1),h_v.size(1), ).contiguous() It seems to work. But i am not sure about the results.

@njwm I did the same in order to get past that error and it worked (even though another similar error came regarding a .sum operation). But can you please verify if it affected the results?

I don't think it makes sense,it just gets past that error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants