-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publish number of parameters for each task #2
Comments
After running the JAX code and using the structure information in the research paper, I obtain the following number of parameters for each task: ListOps: 19.9M Could you kindly confirm that this is indeed correct, so that a fair comparison can be done with alternative models. Thanks/ |
Hi, could you tell me how you count the parameters of the model on cifar10 task? After running, my model has just 50k parameters. Thanks! |
This constraint is not satisfied in the current code. Using default parameters for the Image/CIFAR10 task I found: Transformer # params: 52 266 The Performer model thus doesn't satisfy the 10% constraint, it has more than 400% times the parameters of the Transformer model. I suspect this is due to wrong hyperparameters Transformer: emb_dim: 32, mlp_dim: 64, num_heads: 1, qkv_dim: 32 Everything is larger, this obviously leads to more parameters. Again, I apologize to the authors if this is due to any misconception on my part. |
Hi @redna11, can you tell which MLP dim you considered when calculated the size of a text classification model? It seems that it was 512, while I see 1024 in the LRA config. The information from the paper is also misleading:
Basically, the hyperparameters in the paper seem to be doubled. |
Hello,
you mention: "The new model should be within at best 10% larger in terms of parameters compared to the base Transformer model in the provided config file"
Do you publish what those baseline number of params are respectively for each task?
Thanks
The text was updated successfully, but these errors were encountered: