Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛BUG]我在使用mBART模型和WMT19zh-en时碰到问题。 #346

Open
01vanilla opened this issue Apr 22, 2023 · 2 comments
Open

[🐛BUG]我在使用mBART模型和WMT19zh-en时碰到问题。 #346

01vanilla opened this issue Apr 22, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@01vanilla
Copy link

描述这个 bug
我在使用mBART模型和WMT19zh-en时碰到以下问题。

如何复现
run_textbox.py --model=mBART --model_path=facebook/mbart-large-cc25 --dataset=wmt19-zh-en --src_lang=zh_CN --tgt_lang=en_XX

日志
23 Apr 00:43 INFO Pretrain type: pretrain disabled
:1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma?
:1: SyntaxWarning: 'str' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma?
:1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma?
:1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma?
:1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma?
:1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma?
:1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma?
:1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma?
:1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma?
:1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma?
:1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma?
:1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma?
:1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma?
Token indices sequence length is longer than the specified maximum sequence length for this model (1776 > 1024). Running this sequence through the model will result in indexing errors
Traceback (most recent call last):
File "run_textbox.py", line 15, in
run_textbox(model=args.model, dataset=args.dataset, config_file_list=args.config_files, config_dict={})
File "/hy-tmp/TextBox/textbox/quick_start/quick_start.py", line 20, in run_textbox
experiment = Experiment(model, dataset, config_file_list, config_dict)
File "/hy-tmp/TextBox/textbox/quick_start/experiment.py", line 56, in init
self._init_data(self.get_config(), self.accelerator)
File "/hy-tmp/TextBox/textbox/quick_start/experiment.py", line 82, in _init_data
train_data, valid_data, test_data = data_preparation(config, tokenizer)
File "/hy-tmp/TextBox/textbox/data/utils.py", line 24, in data_preparation
train_dataset.tokenize(tokenizer)
File "/hy-tmp/TextBox/textbox/data/abstract_dataset.py", line 120, in tokenize
ids = tokenizer(
File "/usr/local/miniconda3/envs/TextBox/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2538, in call
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
File "/usr/local/miniconda3/envs/TextBox/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2624, in _call_one
return self.batch_encode_plus(
File "/usr/local/miniconda3/envs/TextBox/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2815, in batch_encode_plus
return self._batch_encode_plus(
File "/usr/local/miniconda3/envs/TextBox/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 428, in _batch_encode_plus
encodings = self._tokenizer.encode_batch(
TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

其中,我使用的transformers版本为4.28.1,torch版本为2.0.0+cu117

@01vanilla 01vanilla added the bug Something isn't working label Apr 22, 2023
@StevenTang1998
Copy link
Member

StevenTang1998 commented Apr 24, 2023

你可以临时注释 https://github.com/RUCAIBox/TextBox/blob/2.0.0/textbox/data/misc.py 中的27~34行,我们之后会尽快修复

@StevenTang1998
Copy link
Member

如果有问题欢迎继续提问

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants