Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #854 #855

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

Miking98
Copy link

@Miking98 Miking98 commented Jan 5, 2023

Add the Paragraph-Level Simplification of Medical Texts dataset. Closes #854

Checkbox

  • Confirm that this PR is linked to the dataset issue.
  • Create the dataloader script biodatasets/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
  • Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _BIGBIO_VERSION variables.
  • Implement _info(), _split_generators() and _generate_examples() in dataloader script.
  • Make sure that the BUILDER_CONFIGS class attribute is a list with at least one BigBioConfig for the source schema and one for a bigbio schema.
  • Confirm dataloader script works with datasets.load_dataset function.
  • Confirm that your dataloader script passes the test suite run with python -m tests.test_bigbio biodatasets/my_dataset/my_dataset.py.
  • If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.

@galtay
Copy link
Collaborator

galtay commented Jan 6, 2023

@Miking98 thanks for this contribution! we are in the middle of updating our contribution guidelines to support hub datasets. Can I ask that we hold off on merging this until the new guidelines are published and/or can you update your PR to include an implementation in the hub_repos directory?

this is the PR that has the new contribution guidelines #850

and this is an example of a PR contributing code to the hub_repos directory (but it wont be easily testable until the PR above is merged) #852

@Miking98
Copy link
Author

Miking98 commented Jan 6, 2023

Thanks for the note @galtay, makes sense! Will hold off until the new guidelines are published in that case, then will revise and submit a new pull request once updated to abide by them. Thanks!

@galtay
Copy link
Collaborator

galtay commented Jan 14, 2023

hello @Miking98 thanks for your patience! we have a new CONTRIBUTING.md file now (https://github.com/bigscience-workshop/biomedical/blob/main/CONTRIBUTING.md) and I was wondering if you'd help us try it out. Please ping me if there are any issues and I'll help get this dataset loader in.

@Miking98
Copy link
Author

Thanks for the note @galtay ! I just went through the revised Contributing doc and updated my pull request accordingly -- please let me know your thoughts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add implementation for the Paragraph-level Simplification of Medical Texts dataset
2 participants