8. sentiment-analysis-with-bert #6

SimplyLucKey · 2021-01-24T13:55:10Z

Hi there,
I was following along with your video guide on this project but with my own dataset.
As I started training for my data I ran into an error.

RuntimeError : stack expects each tensor to be equal size

Our codes are basically identical and the structure of data seems to be identical as well. I'm not sure what's causing this issue or how to resolve it.

More in-depth error:

RuntimeError                              Traceback (most recent call last)
<timed exec> in <module>

<ipython-input-26-8ba1e19dd195> in train_epoch(model, data_loader, loss_fn, optimizer, device, scheduler, n_examples)
      4     correct_predictions = 0
      5 
----> 6     for i in data_loader:
      7         input_ids = i['input_ids'].to(device)
      8         attention_mask = i['attention_mask'].to(device)

~\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py in __next__(self)
    361 
    362     def __next__(self):
--> 363         data = self._next_data()
    364         self._num_yielded += 1
    365         if self._dataset_kind == _DatasetKind.Iterable and \

~\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py in _next_data(self)
    401     def _next_data(self):
    402         index = self._next_index()  # may raise StopIteration
--> 403         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    404         if self._pin_memory:
    405             data = _utils.pin_memory.pin_memory(data)

~\Anaconda3\lib\site-packages\torch\utils\data\_utils\fetch.py in fetch(self, possibly_batched_index)
     45         else:
     46             data = self.dataset[possibly_batched_index]
---> 47         return self.collate_fn(data)

~\Anaconda3\lib\site-packages\torch\utils\data\_utils\collate.py in default_collate(batch)
     72         return batch
     73     elif isinstance(elem, container_abcs.Mapping):
---> 74         return {key: default_collate([d[key] for d in batch]) for key in elem}
     75     elif isinstance(elem, tuple) and hasattr(elem, '_fields'):  # namedtuple
     76         return elem_type(*(default_collate(samples) for samples in zip(*batch)))

~\Anaconda3\lib\site-packages\torch\utils\data\_utils\collate.py in <dictcomp>(.0)
     72         return batch
     73     elif isinstance(elem, container_abcs.Mapping):
---> 74         return {key: default_collate([d[key] for d in batch]) for key in elem}
     75     elif isinstance(elem, tuple) and hasattr(elem, '_fields'):  # namedtuple
     76         return elem_type(*(default_collate(samples) for samples in zip(*batch)))

~\Anaconda3\lib\site-packages\torch\utils\data\_utils\collate.py in default_collate(batch)
     53             storage = elem.storage()._new_shared(numel)
     54             out = elem.new(storage)
---> 55         return torch.stack(batch, 0, out=out)
     56     elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \
     57             and elem_type.__name__ != 'string_':

RuntimeError: stack expects each tensor to be equal size, but got [160] at entry 0 and [161] at entry 5

The text was updated successfully, but these errors were encountered:

kforcodeai · 2021-01-29T05:18:22Z

please attach the encode_plus function too

SimplyLucKey · 2021-02-06T10:43:33Z

please attach the encode_plus function too

class GPReviewDataset(Dataset):
    
    # sets the parameters of the map-style-dataset
    def __init__(self, reviews, targets, tokenizer, max_len):
        self.reviews = reviews
        self.targets = targets
        self.tokenizer = tokenizer
        self.max_len = max_len
        
    # returns the length of the dataset    
    def __len__(self):
        return len(self.reviews)
    
    # access the idx-th item from the input data
    def __getitem__(self, item):
        review = str(self.reviews[item])
        target = self.targets[item]
        
        # tokenizes the data
        encoding = self.tokenizer.encode_plus(text=review, 
                                              max_length=self.max_len,
                                              add_special_tokens=True, # adds [CLS] and [SEP]
                                              padding='max_length', # pad to max length
                                              truncation=True, # truncate to max length
                                              return_attention_mask=True, 
                                              return_token_type_ids=False, 
                                              return_tensors='pt'); # return PyTorch tensors
        
        # return dictionary
        return {'review': review,
                'input_ids': encoding['input_ids'].flatten(), 
                'attention_mask': encoding['attention_mask'].flatten(),
                'targets': torch.tensor(target, dtype=torch.long)}

When I posted that question I was running with all 5 ratings as the 'sentiment' as opposed to positive, negative, and neutral

So my SentimentClassifier had an argument of 5. It was causing the error but an argument of 6 ran fine ...?

model = SentimentClassifier(5)
model = model.to(device)

ZeusFSX · 2021-02-12T16:48:27Z

Add parametr return_dict=False into constructor SentimentClassifier it`s new changes in transformers

self.bert = BertModel.from_pretrained(PRE_TRAINED_MODEL_NAME, return_dict=False)

SimplyLucKey · 2021-02-15T01:39:53Z

Add parametr return_dict=False into constructor SentimentClassifier its new changes in transformers`

self.bert = BertModel.from_pretrained(PRE_TRAINED_MODEL_NAME, return_dict=False)

I have that in my SentimentClassifier function already.

class SentimentClassifier(nn.Module):
    def __init__(self, n_class):
        super(SentimentClassifier, self).__init__()
        self.bert = transformers.BertModel.from_pretrained('bert-base-cased')
        self.drop = nn.Dropout(p=0.3)
        self.out = nn.Linear(self.bert.config.hidden_size, n_class)
        
    def forward(self, input_ids, attention_mask):
        _, pooled_output = self.bert(input_ids=input_ids, attention_mask=attention_mask, return_dict=False)
        output = self.drop(pooled_output)
        return self.out(output)

btw where does p=0.3 come from? Is that one of the recommended parameters from the paper?

ZeusFSX · 2021-02-15T08:08:11Z

Replace the string self.bert = transformers.BertModel.from_pretrained('bert-base-cased') on
self.bert = transformers.BertModel.from_pretrained('bert-base-cased', return_dict=False)
About p=0.3 it's a probability, for random disabling neurons during training. You should check what is the dropout and official documentation pytorch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8. sentiment-analysis-with-bert #6

8. sentiment-analysis-with-bert #6

SimplyLucKey commented Jan 24, 2021

kforcodeai commented Jan 29, 2021

SimplyLucKey commented Feb 6, 2021

ZeusFSX commented Feb 12, 2021

SimplyLucKey commented Feb 15, 2021 •

edited

ZeusFSX commented Feb 15, 2021

8. sentiment-analysis-with-bert #6

8. sentiment-analysis-with-bert #6

Comments

SimplyLucKey commented Jan 24, 2021

kforcodeai commented Jan 29, 2021

SimplyLucKey commented Feb 6, 2021

ZeusFSX commented Feb 12, 2021

SimplyLucKey commented Feb 15, 2021 • edited

ZeusFSX commented Feb 15, 2021

SimplyLucKey commented Feb 15, 2021 •

edited