Closes #57 #686

clancyoftheoverflow · 2022-06-05T18:07:31Z

Please name your PR after the issue it closes. You can use the following line: "Closes #ISSUE-NUMBER" where you replace the ISSUE-NUMBER with the one corresponding to your dataset.

If the following information is NOT present in the issue, please populate:

Name: name of the dataset
Description: short description of the dataset (or link to social media or blog post)
Paper: link to the dataset paper if available
Data: link to the online home of the dataset

Checkbox

Confirm that this PR is linked to the dataset issue.
Create the dataloader script biodatasets/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _BIGBIO_VERSION variables.
Implement _info(), _split_generators() and _generate_examples() in dataloader script.
Make sure that the BUILDER_CONFIGS class attribute is a list with at least one BigBioConfig for the source schema and one for a bigbio schema.
Confirm dataloader script works with datasets.load_dataset function.
Confirm that your dataloader script passes the test suite run with python -m tests.test_bigbio biodatasets/my_dataset/my_dataset.py.
If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.

INFO:main:self.PATH: biodatasets/why_qa/why_qa.py
INFO:main:self.SUBSET_ID: why_qa
INFO:main:self.SCHEMA: None
INFO:main:self.DATA_DIR: /n2c2-community-annotations_2010-fan-why-QA.zip
INFO:main:Checking for _SUPPORTED_TASKS ...
INFO:main:Found _SUPPORTED_TASKS=[<Tasks.QUESTION_ANSWERING: 'QA'>]
INFO:main:_SUPPORTED_TASKS implies _MAPPED_SCHEMAS={'QA'}
INFO:main:schemas_to_check: {'QA'}
INFO:main:Checking load_dataset with config name why_qa_source
WARNING:datasets.builder:Using custom data configuration why_qa_source-e19a1d19cd3b66be
WARNING:datasets.builder:Reusing dataset why_qa_dataset (C:\Users\franc.cache\huggingface\datasets\why_qa_dataset\why_qa_source-e19a1d19cd3b66be\1.0.0\ec02e5a9a780df1b7936db1790c78fd0f075011437d01c3759501a2e5148c8a8)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s]
INFO:main:Checking load_dataset with config name why_qa_bigbio_qa
WARNING:datasets.builder:Using custom data configuration why_qa_bigbio_qa-e19a1d19cd3b66be
WARNING:datasets.builder:Reusing dataset why_qa_dataset (C:\Users\franc.cache\huggingface\datasets\why_qa_dataset\why_qa_bigbio_qa-e19a1d19cd3b66be\1.0.0\ec02e5a9a780df1b7936db1790c78fd0f075011437d01c3759501a2e5148c8a8)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 593.00it/s]
INFO:main:Checking global ID uniqueness
INFO:main:Found 0 unique IDs
INFO:main:Gathering schema statistics
INFO:main:Gathering schema statistics
train

INFO:main:QA ONLY: Checking multiple choice
.

Ran 1 test in 0.062s

OK

Closes bigscience-workshop#57

69c9d7b

clancyoftheoverflow requested review from hakunanatasha, jason-fries, sunnnymskang, ruisi-su, galtay, leonweber, sg-wbi and debajyotidatta as code owners June 5, 2022 18:07

clancyoftheoverflow added 3 commits June 6, 2022 09:49

Update why_qa.py

0e6ffb6

Update why_qa.py

377139f

Update why_qa.py

c91f098

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closes #57 #686

Closes #57 #686

clancyoftheoverflow commented Jun 5, 2022

Closes #57 #686

Are you sure you want to change the base?

Closes #57 #686

Conversation

clancyoftheoverflow commented Jun 5, 2022

Checkbox

INFO:main:QA ONLY: Checking multiple choice .

INFO:main:QA ONLY: Checking multiple choice
.