Issue while Training on CPU #19

Ralfons-06 · 2021-11-16T16:14:14Z

Hi, I just tried to train the model on a cpu but i ran into some Problems.
While Training i always get the output message, that the loss at iteration x is 0 which seems kinda odd:

NLL Loss @ epoch 0001 iteration 00000001 = 0.0000
NLL Loss @ epoch 0063 iteration 00000250 = 0.0000

After going through the code of gran_runner i realized that the part of the code, where the loss is calculated is never called when there is no gpu available since batch_fwd is empty in that case:

GRAN/runner/gran_runner.py

Lines 230 to 259 in 43cb443

    
           avg_train_loss = .0         
        
           for ff in range(self.dataset_conf.num_fwd_pass): 
        
             batch_fwd = [] 
        
             if self.use_gpu: 
        
               for dd, gpu_id in enumerate(self.gpus): 
        
                 data = {} 
        
                 data['adj'] = batch_data[dd][ff]['adj'].pin_memory().to(gpu_id, non_blocking=True)           
        
                 data['edges'] = batch_data[dd][ff]['edges'].pin_memory().to(gpu_id, non_blocking=True) 
        
                 data['node_idx_gnn'] = batch_data[dd][ff]['node_idx_gnn'].pin_memory().to(gpu_id, non_blocking=True) 
        
                 data['node_idx_feat'] = batch_data[dd][ff]['node_idx_feat'].pin_memory().to(gpu_id, non_blocking=True) 
        
                 data['label'] = batch_data[dd][ff]['label'].pin_memory().to(gpu_id, non_blocking=True) 
        
                 data['att_idx'] = batch_data[dd][ff]['att_idx'].pin_memory().to(gpu_id, non_blocking=True) 
        
                 data['subgraph_idx'] = batch_data[dd][ff]['subgraph_idx'].pin_memory().to(gpu_id, non_blocking=True) 
        
                 data['subgraph_idx_base'] = batch_data[dd][ff]['subgraph_idx_base'].pin_memory().to(gpu_id, non_blocking=True) 
        
                 batch_fwd.append((data,)) 
        
             if batch_fwd: 
        
               train_loss = model(*batch_fwd).mean()               
        
               avg_train_loss += train_loss               
        
               # assign gradient 
        
               train_loss.backward() 
        
           # clip_grad_norm_(model.parameters(), 5.0e-0) 
        
           optimizer.step() 
        
           avg_train_loss /= float(self.dataset_conf.num_fwd_pass) 
        
           # reduce 
        
           train_loss = float(avg_train_loss.data.cpu().numpy())

Is this a bug or did i miss something?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue while Training on CPU #19

Issue while Training on CPU #19

Ralfons-06 commented Nov 16, 2021 •

edited

Issue while Training on CPU #19

Issue while Training on CPU #19

Comments

Ralfons-06 commented Nov 16, 2021 • edited

Ralfons-06 commented Nov 16, 2021 •

edited