Skip to content

awesome-software/medAlpaca

 
 

Repository files navigation

medalpaca

medAlpaca: Finetuned Large Language Models for Medical Question Answering

Project Overview

MedAlpaca expands upon both Stanford Alpaca and AlpacaLoRA to offer an advanced suite of large language models specifically fine-tuned for medical question-answering and dialogue applications. Our primary objective is to deliver an array of open-source language models, paving the way for seamless development of medical chatbot solutions.

These models have been trained using a variety of medical texts, encompassing resources such as medical flashcards, wikis, and dialogue datasets. For more details on the data utilized, please consult the data section.

Getting Started

Create a new virtual environment, e.g. with conda

conda create -n medalpaca python>=3.9

Install the required packages:

pip install -r requirements.txt

Training of medAlpaca

training your alpaca

Memory Requirements

We have benchmarked the needed GPU memory as well as the approximate duration per epoch for finetuning LLaMA 7b on the Medical Meadow small dataset (~6000 Q/A pairs) on a single GPU:

Model 8bit trainig LoRA fp16 bf16 VRAM Used Gradient cktp Duration/epoch
LLaMA 7b True True True False 8.9 GB False 77:30
LLaMA 7b False True True False 18.8 GB False 14:30
LLaMA 7b False False True False OOM False -
LLaMA 7b False False False True 79.5 GB True 35:30
LLaMA 7b False False False False OOM True -

Train medAlpaca based on LLaMA

If you have access to the LLaMA or Alpaca weights you can finetune the model with the following command. Just replace <PATH_TO_LLAMA_WEIGHTS> with the folder containing you LLaMA or Alpaca weights.

python medalpaca/train.py \
    --model PATH_TO_LLAMA_WEIGHTS \
    --data_path medical_meadow_small.json \
    --output_dir 'output' \
    --train_in_8bit True \  
    --use_lora True \
    --bf16 True \
    --tf32 False \
    --fp16 False \
    --global_batch_size 128 \
    --per_device_batch_size 8 \

Per default the script performs mixed precision training.
You can toggle 8bit training with the train_in_8bit flag. While 8 bit training currently only works with use_lora True, however you can use LoRA without 8 bit training. It is also able to train other models such as facebook/opt-6.7 with the above script.

Data

Screenshot 2023-03-31 at 09 37 41

To ensure your cherished llamas and alpacas are well-fed and thriving, we have diligently gathered high-quality biomedical open-source datasets and transformed them into instruction tuning formats. We have dubbed this endeavor Medical Meadow. Medical Meadow currently encompasses roughly 1.5 million data points across a diverse range of tasks, including openly curated medical data transformed into Q/A pairs with OpenAI's gpt-3.5-turbo and a collection of established NLP tasks in the medical domain. Please note, that not all data is of the same quantitiy and quality and you may need tp subsample the data for training your own model. We will persistently update and refine the dataset, and we welcome everyone to contribute more 'grass' to Medical Meadow!

Data Overview

Name Source n n included in training
Medical Flashcards medalpaca/medical_meadow_medical_flashcards 33955 33955
Wikidoc medalpaca/medical_meadow_wikidoc 67704 10000
Wikidoc Patient Information medalpaca/medical_meadow_wikidoc_patient_information 5942 5942
Stackexchange academia medalpaca/medical_meadow_stack_exchange 40865 40865
Stackexchange biology medalpaca/medical_meadow_stack_exchange 27887 27887
Stackexchange fitness medalpaca/medical_meadow_stack_exchange 9833 9833
Stackexchange health medalpaca/medical_meadow_stack_exchange 7721 7721
Stackexchange bioinformatics medalpaca/medical_meadow_stack_exchange 5407 5407
USMLE Self Assessment Step 1 medalpaca/medical_meadow_usmle_self 119 92 (test only)
USMLE Self Assessment Step 2 medalpaca/medical_meadow_usmle_self 120 110 (test only)
USMLE Self Assessment Step 3 medalpaca/medical_meadow_usmle_self 135 122 (test only)
MEDIQA original, preprocessed 2208 2208
CORD-19 original, preprocessed 1056660 50000
MMMLU original, preprocessed 3787 3787
Pubmed Health Advice original, preprocessed 10178 10178
Pubmed Causal original, preprocessed 2446 2446
ChatDoctor original 215000 10000
OpenAssistant original 9209 9209

Data description

please refer to DATA_DESCRIPTION.md

Benchmarks

benchmarks

We are benchmarking all models on the USMLE self assessment, which is available at this link. Note, that we removed all questions with images, as our models are not multimodal.

Model Step1 Step2 Step3
LLaMA 7b 0.198 0.202 0.203
Alpaca 7b naive (weights) 0.275 0.266 0.293
Alpaca 7b LoRA 0.220 0.138 0.252
MedAlpaca 7b 0.297 0.312 0.398
MedAlpaca 7b LoRA 0.231 0.202 0.179
MedAlpaca 7b LoRA 8bit 0.231 0.241 0.211
ChatDoctor (7b) 0.187 0.185 0.148
LLaMA 13b 0.222 0.248 0.276
Alpaca 13b naive 0.319 0.312 0.301
MedAlpaca 13b 0.473 0.477 0.602
MedAlpaca 13b LoRA 0.250 0.255 0.255
MedAlpaca 13b LoRA 8bit 0.189 0.303 0.289
MedAlpaca 30b (still training) TBA TBA TBA
MedAlpaca 30b LoRA 8bit 0.315 0.327 0.361

We are continuously working on improving the training as well as our evaluation prompts. Expect this table to change quite a bit.

Access the models

Visit the zoo and have a look at our alpacas here: https://huggingface.co/medalpaca

It should be obvious, but the models provided on this platform are shared for research purposes only and should not be used in any healthcare applications or settings. While we are excited to showcase our experimental models, please be aware that they have not undergone extensive testing or validation, and their reliability cannot be guaranteed. We kindly ask you to exercise caution when using these models, and we appreciate your understanding as we continue to explore and develop this innovative technology.

Chat with medAlpaca

chat-lama

A Convenient interface to our models is coming soon.

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.7%
  • Shell 4.3%