Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError (Sizes of tensors must match) when training on 'WikiSQL' #17

Open
zjyFrank opened this issue Mar 3, 2021 · 3 comments
Open
Labels
bug Something isn't working

Comments

@zjyFrank
Copy link

zjyFrank commented Mar 3, 2021

Hi,

I followed the steps to train on Spider & WikiSQL using a Tesla M40 (24GB memory) using 'train_batch_size=4' (No other changes are made to the model configuration):

# wikisql-bridge-bert-large.sh
num_steps=30000
curriculum_interval=0
num_peek_steps=400
num_accumulation_steps=3
save_best_model_only="True"
train_batch_size=4  # from 16 to 4

It works well on Spider dataset,
but when comes to WikiSQL , I experienced the following error:

--------------------------

wandb: Tracking run with wandb version 0.8.30
wandb: Wandb version 0.10.21 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Run data is saved locally in wandb/run-20210303_163242-23o2pmxp
wandb: Syncing run wikisql.bridge.lstm.meta.ts.ppl-0.85.2.dn.no_from.feat.bert-large-uncased.xavier-1024-512-512-4-3-0.0003-inv-sqr-0.0003-3000-5e-05-inv-sqr-0.0-3000-0.3-0.3-0.0-0.0-1-8-0.1-0.0-res-0.2-0.0-ff-0.4-0.0.210304-003242.scz2
wandb: ⭐️ View project at https://app.wandb.ai/zjy/smore-wikisql-group--final
wandb: 🚀 View run at https://app.wandb.ai/zjy/smore-wikisql-group--final/runs/23o2pmxp
wandb: Run `wandb off` to turn off syncing.

  2%|█▉                                                              | 19/1200 [00:08<08:29,  2.32it/s]
Traceback (most recent call last):
  File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/data/users/zjy/TabularSemanticParsing/src/experiments.py", line 407, in <module>
    run_experiment(args)
  File "/data/users/zjy/TabularSemanticParsing/src/experiments.py", line 392, in run_experiment
    train(sp)
  File "/data/users/zjy/TabularSemanticParsing/src/experiments.py", line 63, in train
    sp.run_train(train_data, dev_data)
  File "/data/users/zjy/TabularSemanticParsing/src/common/learn_framework.py", line 208, in run_train
    loss = self.loss(formatted_batch)
  File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 94, in loss
    outputs = self.forward(formatted_batch)
  File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 129, in forward
    decoder_ptr_value_ids=decoder_ptr_value_ids)
  File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/bridge.py", line 59, in forward
    transformer_output_value_masks)
  File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/bridge.py", line 263, in forward
    schema_hiddens = self.schema_encoder(schema_hiddens, feature_ids)
  File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/bridge.py", line 169, in forward
    field_type_embeddings], dim=2))
RuntimeError: Sizes of tensors must match except in dimension 1. Got 9 and 11 (The offending index is 0)

wandb: Waiting for W&B process to finish, PID 957961
wandb: Program failed with code 1. Press ctrl-c to abort syncing.
wandb: Run summary:
wandb:                   _runtime 83.59444427490234
wandb:      learning_rate/wikisql 0.0003
wandb:                      _step 1
wandb:                 _timestamp 1614789212.7551596
wandb:   fine_tuning_rate/wikisql 1.6666666666666667e-08
wandb: Syncing files in wandb/run-20210303_163242-23o2pmxp:
wandb:   code/src/experiments.py
wandb: plus 8 W&B file(s) and 1 media file(s)
wandb:                                                                                
wandb: Synced wikisql.bridge.lstm.meta.ts.ppl-0.85.2.dn.no_from.feat.bert-large-uncased.xavier-1024-512-512-4-3-0.0003-inv-sqr-0.0003-3000-5e-05-inv-sqr-0.0-3000-0.3-0.3-0.0-0.0-1-8-0.1-0.0-res-0.2-0.0-ff-0.4-0.0.210304-003242.scz2: https://app.wandb.ai/zjy/smore-wikisql-group--final/runs/23o2pmxp

I also tried train_batch_size of 2 , but still Useless , same error occured when switch to GeForce GTX Titan Xp (12GB) or Tesla K80 (11GB).
Any suggestion on the reason of this or what I can try to get rid of it ? Thank you !

@thelyad
Copy link

thelyad commented Mar 3, 2021

I get the same error too. Haven't made any changes to the configuration.

@todpole3
Copy link
Collaborator

todpole3 commented Mar 3, 2021

Sorry I introduced this bug with the checkpoints release.

A temporary fix for Spider is to comment out line
https://github.com/salesforce/TabularSemanticParsing/blob/main/src/utils/trans/bert_utils.py#L31
and uncomment line
https://github.com/salesforce/TabularSemanticParsing/blob/main/src/utils/trans/bert_utils.py#L30.

The issues is that in the released pre-trained checkpoints we used "*" in the hybrid sequence as the wildcard representation (our implementation treats wildcard as a special column in the database), but WikiSQL data is noisy and some text in the dataset contains "*", which causes the model to mis-estimate number of columns in the database.

@todpole3
Copy link
Collaborator

todpole3 commented Mar 3, 2021

I will push a more stable fix later.

@todpole3 todpole3 added the bug Something isn't working label Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants