Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train transformer models on MedNLI #9

Open
ViktorAlm opened this issue Mar 30, 2020 · 3 comments
Open

Train transformer models on MedNLI #9

ViktorAlm opened this issue Mar 30, 2020 · 3 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@ViktorAlm
Copy link

ViktorAlm commented Mar 30, 2020

Great work!

Heres some data that might be more domain related. Its not open but it might be helpful?
https://jgc128.github.io/mednli/

@gsarti
Copy link
Owner

gsarti commented Mar 31, 2020

Looks neat! Didn't know about datasets for medical NLI, that's perfect for our use case indeed!

If someone is interested in finetuning the three pretrained models SciBERT, BioBERT and CovidBERT on this MedNLI dataset using the finetune_nli.py script and upload them on the HuggingFace cloud, I'll add them in the list!

I'm changing the Issue title to make this visible for other contributors!

@gsarti gsarti changed the title Interesting dataset Train transformer models on MedNLI Mar 31, 2020
@gsarti gsarti added enhancement New feature or request help wanted Extra attention is needed labels Mar 31, 2020
@ghost
Copy link

ghost commented Jun 10, 2020

I'm interested in finetuning BioBERT using MedNLI dataset.
I need the following information

a) Why did you choose batch size of 64 instead of 16 to train all the NLI models (biobert-nli, scibert-nli and covidbert-nli) ?
b) How many epochs did you train these models? (default no. of epochs is 1 in the sentence transformer library)

Thanks in advance............... @gsarti

@gsarti
Copy link
Owner

gsarti commented Jun 10, 2020

Hi @kalyanks0611,

The choice of a larger batch size was only due to the intuition that this would limit noise during training, I have no empirical proof that this leads to better downstream performances in practice.

The NLI models were trained with different number of steps (20,000, 23,000 and 30,000 respectively): this is also due to GPU time allowances, and not set empirically. 30,000 steps at batch size 64 correspond to 1,920,000 examples, which is a bit less than two full epochs on MultiNLI + SNLI, that together account for roughly 1M sentence pairs.

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants