Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot use scheduler for grad_factor #522

Open
violenil opened this issue Aug 19, 2021 · 1 comment
Open

cannot use scheduler for grad_factor #522

violenil opened this issue Aug 19, 2021 · 1 comment
Labels
feat / config Configuration system and config files

Comments

@violenil
Copy link

In my model implementation, I would like to freeze the transformer (using roberta-base in a Tok2VecTransformer.v1) for the first 2 epochs during training. From this spacy documentation, it seems like it should be possible to set the grad_factor to 0 in order to disable gradients from one of the listeners. Setting this up per epoch should then be possible, according to the same documentation, by using a scheduler. In my config, I have specified the constant_then scheduler followed by another constant scheduler in the following way:

[components.seq2labels.model.tok2vec]
@architectures = "spacy-transformers.Tok2VecTransformer.v1"
name = "roberta-base"
tokenizer_config = {"use_fast": true}

[components.seq2labels.model.tok2vec.grad_factor]
@schedules = "constant_then.v1"
rate = 0.0
steps = 2000

[components.seq2labels.model.tok2vec.grad_factor.schedule]
@schedules = "constant.v1"
rate = 1.0

When initializing, I get the following error:

=========================== Initializing pipeline ===========================
✘ Config validation error
seq2labels.model.tok2vec -> grad_factor   value is not a valid float

It seems to me that the scheduler may be returning and iterator instead of a float that can be used as a value here. Have I overlooked some aspect that should still be implemented/ammended?

Otherwise, if this scheduler does not work with grad_factor, is there another way to freeze the transformer only for the first 2 epochs of training?

Thanks for any help in advance :)

@polm polm added the feat / config Configuration system and config files label Aug 29, 2021
@polm
Copy link
Contributor

polm commented Aug 29, 2021

This is basically because grad_factor isn't designed to take a sequence of values, like an iterator, as you note. That's not just an oversight, the transformers model isn't designed to support a sequence there at the moment.

If you look at a place where the value can be a sequence or float, like the learn rate in Adam, you'll see that the type is annotated as FloatOrSeq. In contrast, grad_factor is just a float.

This also isn't just a type issue - the implementation of the Transformer architecture would need to be changed to work with non-constant values. Looking at it I don't think it would be complicated.

I've wanted this feature myself when training models before, so I think we could certainly consider adding it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / config Configuration system and config files
Projects
None yet
Development

No branches or pull requests

2 participants