Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'BPE' object has no attribute 'glossaries_regex' #120

Open
zwshan opened this issue Dec 25, 2023 · 1 comment
Open

AttributeError: 'BPE' object has no attribute 'glossaries_regex' #120

zwshan opened this issue Dec 25, 2023 · 1 comment

Comments

@zwshan
Copy link

zwshan commented Dec 25, 2023

I am running the gnmt pytorch from https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Translation/GNMT,
when I run

python3 translate.py   --model /workspace/autoFL/nvidia_gnmt_torch/nvidia_gnmtpyt_fp32_20190806.pth   --input /workspace/autoFL/GNMT/scripts/data/wmt16_de_en/newstest2014.en   --reference /workspace/autoFL/GNMT/scripts/data/wmt16_de_en/newstest2014.de   --output /tmp/output   --math fp32    --batch-size 128   --beam-size 1 2 5   --tables

there is a error

0: thread affinity: {0}
0: Run arguments: Namespace(affinity='single_unique', batch_first=True, batch_size=[128], beam_size=[1, 2, 5], bleu=True, cov_penalty_factor=0.1, cuda=True, cudnn=True, dllog_file='eval_log.json', env=False, input='/workspace/autoFL/GNMT/scripts/data/wmt16_de_en/newstest2014.en', input_text=None, len_norm_const=5.0, len_norm_factor=0.6, local_rank=0, math=['fp32'], max_seq_len=80, model='/workspace/autoFL/nvidia_gnmt_torch/nvidia_gnmtpyt_fp32_20190806.pth', output='/tmp/output', percentiles=(90, 95, 99), print_freq=1, rank=0, reference='/workspace/autoFL/GNMT/scripts/data/wmt16_de_en/newstest2014.de', repeat={128: 1}, save_dir='gnmt', sort=False, synthetic=False, synthetic_batches=64, synthetic_len=50, synthetic_vocab=32320, tables=True, target_bleu=None, target_perf=None, warmup=0)
0: Restoring state of the tokenizer
0: math: fp32, batch size: 128, beam size: 1
0: Running evaluation on test set
Traceback (most recent call last):
  File "translate.py", line 371, in <module>
    passed = main()
  File "translate.py", line 315, in main
    reference_path=args.reference,
  File "/workspace/autoFL/GNMT/seq2seq/inference/translator.py", line 123, in run
    warmup, summary)
  File "/workspace/autoFL/GNMT/seq2seq/inference/translator.py", line 184, in evaluate
    for i, (src, indices) in enumerate(loader):
  File "/root/anaconda3/envs/bonito/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 628, in __next__
    data = self._next_data()
  File "/root/anaconda3/envs/bonito/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 671, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/root/anaconda3/envs/bonito/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/root/anaconda3/envs/bonito/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/workspace/autoFL/GNMT/seq2seq/data/dataset.py", line 158, in __getitem__
    tokenized = self.tokenizer.tokenize(raw)
  File "/workspace/autoFL/GNMT/seq2seq/data/tokenizer.py", line 136, in tokenize
    bpe = self.bpe.process_line(tokenized)
  File "/root/anaconda3/envs/bonito/lib/python3.7/site-packages/subword_nmt/apply_bpe.py", line 122, in process_line
    out += self.segment(line, dropout)
  File "/root/anaconda3/envs/bonito/lib/python3.7/site-packages/subword_nmt/apply_bpe.py", line 132, in segment
    segments = self.segment_tokens(sentence.strip('\r\n ').split(' '), dropout)
  File "/root/anaconda3/envs/bonito/lib/python3.7/site-packages/subword_nmt/apply_bpe.py", line 142, in segment_tokens
    new_word = [out for segment in self._isolate_glossaries(word)
  File "/root/anaconda3/envs/bonito/lib/python3.7/site-packages/subword_nmt/apply_bpe.py", line 150, in <listcomp>
    self.glossaries_regex,
AttributeError: 'BPE' object has no attribute 'glossaries_regex'

Could you please help me?

@rsennrich
Copy link
Owner

This is likely a version conflict.

GNMT lists commit 48ba99e in https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Translation/GNMT/requirements.txt , which does not yet have glossaries_regex.

I think what is happening is that GNMT saves the model including tokenizer (using commit 48ba99e), and you're then trying to run inference with a newer version of subword_nmt which expects different attributes. Installing commit 48ba99e should solve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants