If there is any rule to modify the parameters #73

zkzhou126 · 2024-01-26T04:20:32Z

Hello! I trained the model on the WMT16 dataset and modified the parameters to the following values

The main modifications were dim and seq_len, what's more, I change the learning_step to 120000, to make the result better.
But I still got very poor results.

I wonder when I change these parameters, do I have to change other parameters along with them?
When I trained the model with your original parameters, the results were not good enough because of dim and seq_len, but they were better than the current results.

summmeer · 2024-02-23T06:01:34Z

Hi,
Many hyper-parameters may take effects on the final results, including bsz, seq_len, dim, steps and tokenizers. Also, other techniques such as self-conditioning, length prediction, may help the training.

zkzhou126 changed the title ~~training on wmt16~~ If there is any rule to modify the parameters Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If there is any rule to modify the parameters #73

If there is any rule to modify the parameters #73

zkzhou126 commented Jan 26, 2024

summmeer commented Feb 23, 2024

If there is any rule to modify the parameters #73

If there is any rule to modify the parameters #73

Comments

zkzhou126 commented Jan 26, 2024

summmeer commented Feb 23, 2024