Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with log_prob values not exported to Cuda #110

Open
bigmb opened this issue May 5, 2022 · 2 comments
Open

Issue with log_prob values not exported to Cuda #110

bigmb opened this issue May 5, 2022 · 2 comments

Comments

@bigmb
Copy link

bigmb commented May 5, 2022

Issue Description

A clear and concise description of the issue. If it's a feature request, please add [Feature Request] to the title.

Not able to get all the data into 'device (CUDA)'. Facing problem at 'loss = -dist_y.log_prob(data).mean()'. Looks like data cant be transferred to GPU.
Do we need to regester data as buffer and work around it?

Error: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:1! (when checking argument for argument mat1 in method wrapper_addmm)

Steps to Reproduce

Please provide steps to reproduce the issue attaching any error messages and stack traces.

dataset = torch.tensor(data_train, dtype=torch.float)
trainloader = torch.utils.data.DataLoader(dataset, batch_size=1024)
for steps in range(t_steps):
    step_loss=0
    for i, data in enumerate(trainloader):
        data = data.to(device)
        if i==0:
            print(data.shape)
            #p_getsizeof(data)
        try:
            optimizer.zero_grad()
            loss = -dist_y.log_prob(data).mean()
            loss.backward()
            optimizer.step()
        except ValueError as e:
            print('Error')
            print('Skipping thatbatch')

Expected Behavior

What did you expect to happen?

Matrices should be computated in the CUDA device and not show a conflit of data being at 2 different place.

System Info

Please provide information about your setup

  • PyTorch Version (run print(torch.__version__)
  • Python version

Additional Context

@stefanwebb
Copy link
Contributor

Hi @bigmb, thanks for submitting this issue! I don't believe we've tested on GPUs yet - I can look into this if it's failing.

Could you please submit a full testfile, and have you run dist_y.to(device) after creating the flow? It is an nn.Module and will need to be transferred to the GPU

@waltergerych
Copy link

waltergerych commented Nov 30, 2022

Incase you or anyone else is still having an issue with this: remember to put (the parameters of) dist_x on cuda as well! I was having the same issue but this seemed to solve it

dist_x= torch.distributions.Independent(torch.distributions.Normal(torch.zeros(z_dim).to(device),
torch.ones(z_dim).to(device)),  1)

bijector = bij.SplineAutoregressive()

flow = dist.Flow(dist_x, bijector).to(device)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants