Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues in evaluation code. #15

Open
ashmalvayani opened this issue Feb 7, 2024 · 0 comments
Open

Issues in evaluation code. #15

ashmalvayani opened this issue Feb 7, 2024 · 0 comments

Comments

@ashmalvayani
Copy link

It looks like the code you've provided "run_eval.py" is not consistent with the benchmark dataset you've provided. I've encountered a few issues and I'd like to know their solutions:

  1. args.geobenchmark is set to "npee" but there's no such benchmark file. I've read in another issue that you've asked to replace with "geobenchmark_npee.json". However, I am not sure how the code will run when we'll pass the apstudy.json benchmark as the code is only written for npee.

  2. In the following image, the code line "for the_answer_is in ['wa', 'woa']" can you please explain what is this 'wa' and 'woa' as it's not mentioned in the code-base or inside the npee dataset anywhere.

  3. In the same image, code line "source = source_target['source'][question_type][the_answer_is]", if you load npee.json benchmark file as json then source_target['source'] will give keyword error as only 6 keys are available ['noun', 'choice', 'completion', 'tf', 'qa', 'discussion'] so this key seems to be wrong.

  4. Moreover, even if you say "source_target[question_type][the_answer_is]" is the correct format, still "the_answer_is" is a key error as only ['question', 'answer'] exist in the "choice" element of npee file. What's the right format?
    image

  5. How do you evaluate and test the apstudy.json benchmark as the code is not written for that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant