Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using code metric 'code_eval_octopack' instead of original 'code_eval' #16

Open
JunHyungKang opened this issue Sep 1, 2023 · 1 comment

Comments

@JunHyungKang
Copy link

1

Is this feature solely for multi-language support?
When I run the results through 'code_eval' in the original humaneval.py, I only achieve a 'pass@1' score of about 36%.

2

Are there any other considerations?
Is it fair to add an import helper?

@Muennighoff
Copy link
Collaborator

1

Yes it is solely for multi-language support.
The reason you get only 36% is that the normal humaneval does not use our prompting format (no Question & Answer like during the instruction tuning), so the model then tries to add them.
In the normal HumanEval this leads to a syntax error, e.g.
"def truncate_number(number: float) -> float:\n \"\"\" Given a positive floating point number, it can be decomposed into\n and integer part (largest integer smaller than given number) and decimals\n (leftover part always smaller than 1).\n\n Return the decimal part of the number.\n >>> truncate_number(3.5)\n 0.5\n \"\"\"\n return number - int(number)\n\n\nAnswer: \"\"\"\nWrite a function that takes a positive floating point number as input and\nreturns the decimal part of the number.\n\nFor example, given the number 3.5, the function should return 0.5.\n\nNote: The input number can be a negative number or zero.\n\nAnswer: import math\n\n",
In the normal HumanEval this leads to a syntax error. In HumanEvalSynthesize a) the prompting format is aligned & b) the postprocessing is cleaner such that in the above example the Answer:... would be cutoff from the generation (it cuts off everything after a function is finished) and there would be no syntax error.

I think these are both fair as when using the model it is a) trivial to use the correct prompting format & b) simple to remove trailing stuff that's not needed.

2

The import helpers do not really make a difference - I think for Python they might not change the score at all. The reason they are added is that the model is not given the chance to modify the imports at the top but is directly prompted with the function start. In Python it could add necessary imports even at the function start, but in Go and other ones that does not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants