Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run example humaneval code #27

Open
yaoyanglee opened this issue Jun 12, 2023 · 4 comments
Open

Unable to run example humaneval code #27

yaoyanglee opened this issue Jun 12, 2023 · 4 comments

Comments

@yaoyanglee
Copy link

`!pip install sentencepiece
from codetf.models import load_model_pipeline
from codetf.data_utility.human_eval_dataset import HumanEvalDataset
from codetf.performance.model_evaluator import ModelEvaluator
import os

os.environ["HF_ALLOW_CODE_EVAL"] = "1"
os.environ["TOKENIZERS_PARALLELISM"] = "true"

model_class = load_model_pipeline(model_name="causallm", task="pretrained",
model_type="codegen-350M-mono", is_eval=True,
load_in_8bit=True, weight_sharding=False)

dataset = HumanEvalDataset(tokenizer=model_class.get_tokenizer())
prompt_token_ids, prompt_attention_masks, references = dataset.load()

problems = TensorDataset(prompt_token_ids, prompt_attention_masks)

evaluator = ModelEvaluator(model_class)
avg_pass_at_k = evaluator.evaluate_pass_k(problems=problems, unit_tests=references)
print("Pass@k: ", avg_pass_at_k)`

Above is the code that was used. During execution in Google Colab, I received the error,
in <cell line: 15>:15 │
│ │
│ /usr/local/lib/python3.10/dist-packages/codetf/data_utility/human_eval_dataset.py:29 in load │
│ │
│ 26 │ │ │ unit_test = re.sub(r'METADATA = {[^}]*}', '', unit_test, flags=re.MULTILINE) │
│ 27 │ │ │ references.append(unit_test) │
│ 28 │ │ │
│ ❱ 29 │ │ prompt_token_ids, prompt_attention_masks = self.process_data(prompts, use_max_le │
│ 30 │ │ │
│ 31 │ │ return prompt_token_ids, prompt_attention_masks, references │
│ 32 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: BaseDataset.process_data() got an unexpected keyword argument 'use_max_length'

After looking through the source code I don't seem to see this keyword argument, apart from max_length. Would anyone mind shedding some light on the issue?

@Luxios22
Copy link

Same issue here.

@Luxios22
Copy link

After I tried to remove the keyword, it also generates the error like the following:
NameError: name 'TensorDataset' is not defined
I think this is something missing in the import part.
After I fixed all things mentioned above, it began to work.

And I looked into the package(1.0.1.1) installed on my local server, I found the codes for this version did not sync with the main branch of the repo. It seems the latest main branch has fixed this issue. So I think we can fix it by reinstall the package from the repo rather than pip.

@yaoyanglee
Copy link
Author

For the TensorDataSet NameError, I found that adding this line solves the issue
from torch.utils.data import TensorDataset

@yaoyanglee
Copy link
Author

I would recommend upgrading numpy as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants