Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor size doesn't match. #58

Open
ghost opened this issue Jun 28, 2020 · 0 comments
Open

Tensor size doesn't match. #58

ghost opened this issue Jun 28, 2020 · 0 comments

Comments

@ghost
Copy link

ghost commented Jun 28, 2020

Hello!

We are working on custom corpus BERT pretraining. I followed the guide about data preparation (texts should be good now), however running the notebook gives the following error:

2 items cleaning up...
Cleanup took 0.0017843246459960938 seconds
06/28/2020 11:53:45 - INFO - __main__ -   Exiting context: ProjectPythonPath
Traceback (most recent call last):
  File "train.py", line 482, in <module>
    eval_loss = train(index)
  File "train.py", line 132, in train
    batch = next(dataloaders[dataset_type])
  File "train.py", line 47, in <genexpr>
    return (x for x in DataLoader(dataset, batch_size=train_batch_size // 2 if eval_set else train_batch_size,
  File "/opt/miniconda/envs/amlbert/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 615, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/opt/miniconda/envs/amlbert/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 232, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/opt/miniconda/envs/amlbert/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 232, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "/opt/miniconda/envs/amlbert/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 209, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 144 and 128 in dimension 1 at /pytorch/aten/src/TH/generic/THTensorMoreMath.cpp:1307

which I don't understand perfectly in the current context.

We tried to run also with wiki en corpus data, still the same. Have tried with large-cased and multilingual-cased vocabs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants