Skip to content

ju-resplande/askD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


AskDocs: A medical QA dataset

https://en.wikipedia.org/wiki/Stethoscope
GitHub release (latest by date) GitHub GitHub Repo stars

ELI5 dataset adapted on Medical Questions (AskDocs) subreddit.

Getting Started

Train Valid Test External
en 24256 5198 5198 166804
pt 24256 5198 5198 166804

The dataset questions and answers span a period from January 2013 to December 2019.

We additionally translated to Portuguese and used external data from here, which is a binary classification dataset "a QNLI medical-like". We adapted to value 5 or 0.

Usage

Datasets 🤗

from datasets import load_dataset

data = load_dataset("ju-resplande/askD", split="train_pt")
# ['train_en', 'validation_en', 'test_en', 'external_en', 'train_pt', 'validation_pt', 'test_pt', 'external_pt']

Citing

@misc{Gomes20202,
  author = {GOMES, J. R. S.},
  title = {AskDocs: A medical QA dataset},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/ju-resplande/askD}},
  commit = {42060c4402c460e174cbb75a868b429c554ba2b7}
}

Acknowledgments

@viniciusplo and @ruanchaves for giving the idea. 😃