Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training script/configuration of T5 generator for GenQ #141

Open
jihyukkim-nlp opened this issue Apr 20, 2023 · 2 comments
Open

Training script/configuration of T5 generator for GenQ #141

jihyukkim-nlp opened this issue Apr 20, 2023 · 2 comments

Comments

@jihyukkim-nlp
Copy link

Hello, thank you for sharing this repo and useful hugging face model cards!

I am interested in T5 generators for query generation, and trying to extend this to other datasets/tasks.
For doing so, I would like to reproduce T5 generators, specifically BeIR/query-gen-msmarco-t5-large-v1.

I am wondering if the training script and training configurations for the generators can be shared,
including

  • T5 initial checkpoint (Did you use google/t5-v1_1-large or google/flan-t5-large?),
  • maximum source/target length,
  • batch size,
  • optimizer,
  • learning rate,
  • learning rate scheduling,
  • warmup steps,
  • and total training steps.

Best regards,
Jihyuk Kim

@thakur-nandan
Copy link
Member

HI @jihyukkim-nlp,

Sadly, I do not remember these exact training details for training the question generator models. @nreimers could you help me here?

Regarding the first two points:

  • We use the T5-Large model as the initial checkpoint (Not the Flan T5 large model).
  • I believe the maximum length for the target question is 64 tokens and the source is 350 tokens.

@jihyukkim-nlp
Copy link
Author

Hi @thakur-nandan,

Thank you for the information!
I have tried a few different configurations, e.g., different learning rates (1e-5, 3e-5, 5e-5, 1e-4) either with or without warmup steps. But I have failed to reproduce.

It worked relatively well for MS MARCO, but not for BEIR.
At this point, I am also wondering if other datasets, such as NQ, were used for training the generator by any chance?

It will be really helpful if @nreimers can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants