We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in the MMLU dataset production process:
from unitxt.templates import MultipleChoiceTemplate # self.template = MultipleChoiceTemplate(type='multiple_choice_template', artifact_identifier='template_0', _requirements_list=[], caching=None, apply_to_streams=None, dont_apply_to_streams=None, skip_rendered_instance=True, postprocessors=['processors.first_character'], instruction='', target_prefix='', title_fields=[], input_format='Question: [question] Choices: [choices] Answer: [answer]\nQuestion: {question} Choices: {choices} Answer:', choices_field='choices', target_field='answer', choices_seperator=' ', source_choice_format='{choice_numeral}. {choice_text}', target_choice_format='{choice_numeral}', enumerator='ABCDEFGHIJKLMNOP', shuffle_choices=False) template=MultipleChoiceTemplate( input_format=self.template.input_format, target_field=self.template.target_field, choices_seperator=self.template.choices_seperator, enumerator="numerals", postprocessors=self.template.postprocessors, ) recipe = StandardRecipe( card='cards.mmlu.anatomy', template=template, format='formats.empty', num_demos=1, demos_pool_size=10, max_train_instances=1, max_validation_instances=1, max_test_instances=1, ) dataset = recipe().to_dataset()
The output is: ... Generating train split: 1 examples [00:01, 1.83s/ examples] Generating train split: 1 examples [00:00, 2.79 examples/s] Generating train split: 5 examples [00:00, 8.95 examples/s] Generating train split: 0 examples [00:00, ? examples/s] raise DatasetGenerationError("An error occurred while generating the dataset") from e datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
when I adjust the parameters of the demos, the process seems to work as expected:
from unitxt.templates import MultipleChoiceTemplate # self.template = MultipleChoiceTemplate(type='multiple_choice_template', artifact_identifier='template_0', _requirements_list=[], caching=None, apply_to_streams=None, dont_apply_to_streams=None, skip_rendered_instance=True, postprocessors=['processors.first_character'], instruction='', target_prefix='', title_fields=[], input_format='Question: [question] Choices: [choices] Answer: [answer]\nQuestion: {question} Choices: {choices} Answer:', choices_field='choices', target_field='answer', choices_seperator=' ', source_choice_format='{choice_numeral}. {choice_text}', target_choice_format='{choice_numeral}', enumerator='ABCDEFGHIJKLMNOP', shuffle_choices=False) template=MultipleChoiceTemplate( input_format=self.template.input_format, target_field=self.template.target_field, choices_seperator=self.template.choices_seperator, enumerator="numerals", postprocessors=self.template.postprocessors, ) recipe = StandardRecipe( card='cards.mmlu.anatomy', template=template, format='formats.empty', num_demos=0, demos_pool_size=None, max_train_instances=1, max_validation_instances=1, max_test_instances=1, ) dataset = recipe().to_dataset()
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Bug in the MMLU dataset production process:
when I adjust the parameters of the demos, the process seems to work as expected:
The text was updated successfully, but these errors were encountered: