prepare_dataset.py issue #1582

Fred-cell · 2024-05-12T04:27:55Z

when I prepare dataset for gptManagerBenchmark, I encountered an issue as below:(v0.9.0)
]#python prepare_dataset.py --request-rate -1 --time-delay-dist constant --time-delay-dist constant --tokenizer /code/tensorrt-llm/chatglm3-6b/ token-norm-dist --num-requests 16 --input-mean 1024 --input-stdev 0 --output-mean 512 --output-stdev 0
The repository for /code/tensorrt-llm/chatglm3-6b/ contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//code/tensorrt-llm/chatglm3-6b/.
You can avoid this prompt in future by passing the argument trust_remote_code=True.

Do you wish to run the custom code? [y/N] y
Traceback (most recent call last):
File "/code/tensorrt-llm/TensorRT-LLM/benchmarks/cpp/prepare_dataset.py", line 109, in
cli()
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1685, in invoke
super().invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/code/tensorrt-llm/TensorRT-LLM/benchmarks/cpp/prepare_dataset.py", line 94, in cli
ctx.obj = RootArgs(tokenizer=kwargs['tokenizer'],
File "/usr/local/lib/python3.10/dist-packages/pydantic/main.py", line 171, in init
self.pydantic_validator.validate_python(data, self_instance=self)
File "/code/tensorrt-llm/TensorRT-LLM/benchmarks/cpp/prepare_dataset.py", line 44, in get_tokenizer
tokenizer.pad_token = tokenizer.eos_token
AttributeError: can't set attribute 'pad_token'

The text was updated successfully, but these errors were encountered:

byshiue · 2024-05-15T01:49:41Z

You could change

        tokenizer.pad_token = tokenizer.eos_token

to

        if tokenizer.pad_token is None:
            tokenizer.pad_token = tokenizer.eos_token

We will fix it in next update.

byshiue self-assigned this May 15, 2024

byshiue added bug Something isn't working triaged Issue has been triaged by maintainers labels May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prepare_dataset.py issue #1582

prepare_dataset.py issue #1582

Fred-cell commented May 12, 2024 •

edited

byshiue commented May 15, 2024 •

edited

prepare_dataset.py issue #1582

prepare_dataset.py issue #1582

Comments

Fred-cell commented May 12, 2024 • edited

byshiue commented May 15, 2024 • edited

Fred-cell commented May 12, 2024 •

edited

byshiue commented May 15, 2024 •

edited