Unable to run example humaneval code #27

yaoyanglee · 2023-06-12T02:53:24Z

`!pip install sentencepiece
from codetf.models import load_model_pipeline
from codetf.data_utility.human_eval_dataset import HumanEvalDataset
from codetf.performance.model_evaluator import ModelEvaluator
import os

os.environ["HF_ALLOW_CODE_EVAL"] = "1"
os.environ["TOKENIZERS_PARALLELISM"] = "true"

model_class = load_model_pipeline(model_name="causallm", task="pretrained",
model_type="codegen-350M-mono", is_eval=True,
load_in_8bit=True, weight_sharding=False)

dataset = HumanEvalDataset(tokenizer=model_class.get_tokenizer())
prompt_token_ids, prompt_attention_masks, references = dataset.load()

problems = TensorDataset(prompt_token_ids, prompt_attention_masks)

evaluator = ModelEvaluator(model_class)
avg_pass_at_k = evaluator.evaluate_pass_k(problems=problems, unit_tests=references)
print("Pass@k: ", avg_pass_at_k)`

Above is the code that was used. During execution in Google Colab, I received the error,
in <cell line: 15>:15 │
│ │
│ /usr/local/lib/python3.10/dist-packages/codetf/data_utility/human_eval_dataset.py:29 in load │
│ │
│ 26 │ │ │ unit_test = re.sub(r'METADATA = {[^}]*}', '', unit_test, flags=re.MULTILINE) │
│ 27 │ │ │ references.append(unit_test) │
│ 28 │ │ │
│ ❱ 29 │ │ prompt_token_ids, prompt_attention_masks = self.process_data(prompts, use_max_le │
│ 30 │ │ │
│ 31 │ │ return prompt_token_ids, prompt_attention_masks, references │
│ 32 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: BaseDataset.process_data() got an unexpected keyword argument 'use_max_length'

After looking through the source code I don't seem to see this keyword argument, apart from max_length. Would anyone mind shedding some light on the issue?

Luxios22 · 2023-06-13T19:35:08Z

Same issue here.

Luxios22 · 2023-06-13T19:56:15Z

After I tried to remove the keyword, it also generates the error like the following:
NameError: name 'TensorDataset' is not defined
I think this is something missing in the import part.
After I fixed all things mentioned above, it began to work.

And I looked into the package(1.0.1.1) installed on my local server, I found the codes for this version did not sync with the main branch of the repo. It seems the latest main branch has fixed this issue. So I think we can fix it by reinstall the package from the repo rather than pip.

yaoyanglee · 2023-06-14T02:13:14Z

For the TensorDataSet NameError, I found that adding this line solves the issue
from torch.utils.data import TensorDataset

yaoyanglee · 2023-06-14T02:14:24Z

I would recommend upgrading numpy as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to run example humaneval code #27

Unable to run example humaneval code #27

yaoyanglee commented Jun 12, 2023

Luxios22 commented Jun 13, 2023

Luxios22 commented Jun 13, 2023

yaoyanglee commented Jun 14, 2023

yaoyanglee commented Jun 14, 2023

Unable to run example humaneval code #27

Unable to run example humaneval code #27

Comments

yaoyanglee commented Jun 12, 2023

Luxios22 commented Jun 13, 2023

Luxios22 commented Jun 13, 2023

yaoyanglee commented Jun 14, 2023

yaoyanglee commented Jun 14, 2023