Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue while Training on CPU #19

Open
Ralfons-06 opened this issue Nov 16, 2021 · 0 comments
Open

Issue while Training on CPU #19

Ralfons-06 opened this issue Nov 16, 2021 · 0 comments

Comments

@Ralfons-06
Copy link

Ralfons-06 commented Nov 16, 2021

Hi, I just tried to train the model on a cpu but i ran into some Problems.
While Training i always get the output message, that the loss at iteration x is 0 which seems kinda odd:

NLL Loss @ epoch 0001 iteration 00000001 = 0.0000
NLL Loss @ epoch 0063 iteration 00000250 = 0.0000

After going through the code of gran_runner i realized that the part of the code, where the loss is calculated is never called when there is no gpu available since batch_fwd is empty in that case:

GRAN/runner/gran_runner.py

Lines 230 to 259 in 43cb443

avg_train_loss = .0
for ff in range(self.dataset_conf.num_fwd_pass):
batch_fwd = []
if self.use_gpu:
for dd, gpu_id in enumerate(self.gpus):
data = {}
data['adj'] = batch_data[dd][ff]['adj'].pin_memory().to(gpu_id, non_blocking=True)
data['edges'] = batch_data[dd][ff]['edges'].pin_memory().to(gpu_id, non_blocking=True)
data['node_idx_gnn'] = batch_data[dd][ff]['node_idx_gnn'].pin_memory().to(gpu_id, non_blocking=True)
data['node_idx_feat'] = batch_data[dd][ff]['node_idx_feat'].pin_memory().to(gpu_id, non_blocking=True)
data['label'] = batch_data[dd][ff]['label'].pin_memory().to(gpu_id, non_blocking=True)
data['att_idx'] = batch_data[dd][ff]['att_idx'].pin_memory().to(gpu_id, non_blocking=True)
data['subgraph_idx'] = batch_data[dd][ff]['subgraph_idx'].pin_memory().to(gpu_id, non_blocking=True)
data['subgraph_idx_base'] = batch_data[dd][ff]['subgraph_idx_base'].pin_memory().to(gpu_id, non_blocking=True)
batch_fwd.append((data,))
if batch_fwd:
train_loss = model(*batch_fwd).mean()
avg_train_loss += train_loss
# assign gradient
train_loss.backward()
# clip_grad_norm_(model.parameters(), 5.0e-0)
optimizer.step()
avg_train_loss /= float(self.dataset_conf.num_fwd_pass)
# reduce
train_loss = float(avg_train_loss.data.cpu().numpy())

Is this a bug or did i miss something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant