Register `weights` as a non-persistent buffer of `SinusoidalPositionalEmbedding` #5213

MaigoAkisame · 2023-06-22T23:42:40Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
Discussed this in a GitHub issue of the pytorch repo.
Did you read the contributor guideline?
Did you make sure to update the docs?
Not applicable
Did you write any new necessary tests?
No. But I've tested deeplearning/projects/fairseq-py:test_cpu in Meta's fbcode repo, and this diff does not introduce any new test failures.

What does this PR do?

The module SinusoidalPositionalEmbedding has the problem that its weights attribute is not moved to CPU or CUDA when the module is moved.

Registering weights as a buffer solves the problem.
This also eliminates the need for the buffer _float_tensor, which is used to keep track of whether the module is on CPU or CUDA.

Making weights a non-persistent buffer means it won't be saved to or loaded from a state_dict.

With the changes in this diff, the state_dict of a SinusoidalPositionalEmbedding module should contain neither weights or _float_tensor.
This diff ignores them by overriding the _load_from_state_dict method of the SinusoidalPositionalEmbedding module, instead of duplicating the code in many upgrade_state_dict functions.

TO DISCUSS: Is it OK for me to override _load_from_state_dict? It's a private function, but I see people have overriden it in many places, including in fairseq:
https://github.com/search?q=super()._load_from_state_dict&type=code
https://github.com/search?q=repo%3Afacebookresearch%2Ffairseq%20super()._load_from_state_dict&type=code

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

…lEmbedding`

dianaml0 · 2023-06-23T17:08:24Z

fairseq/model_parallel/models/pipeline_parallel_transformer/model.py

@@ -441,7 +441,6 @@ def convert_to_pipeline_parallel_state_dict(self, state_dict):
                # fmt: off
                if isinstance(module, TransformerEncoderEmbedding):
                    new_state_dict[f'model.partitions.{pid}.{mid}.embed_tokens.weight'] = state_dict['encoder.embed_tokens.weight']
-                    new_state_dict[f'model.partitions.{pid}.{mid}.embed_positions._float_tensor'] = state_dict['encoder.embed_positions._float_tensor']


How come this is removed?

The _float_tensor buffer was just a dummy tensor to track whether the SinusoidalPositionalEmbedding object is on CPU or CUDA. Now the weights buffer can be moved between CPU and CUDA, we no longer need _float_tensor.

dianaml0 · 2023-06-23T17:14:27Z

Does this keep backwards compatability?

MaigoAkisame · 2023-06-23T17:22:03Z

Does this keep backwards compatability?

Yes. The buffer weights is created upon construction of a SinusoidalPositionalEmbedding object; it doesn't need to be loaded from a state_dict.
With the changes in this PR, no matter whether a state_dict contains the keys weights and _float_tensor or not, these keys will be ignored.

MaigoAkisame · 2023-06-23T17:23:33Z

BTW In the discussion on the pytorch issue, they've said it's OK to override the _load_from_state_dict method.

dianaml0

LGTM!

Register weights as a non-persistent buffer of `SinusoidalPositiona…

443759c

…lEmbedding`

facebook-github-bot added the CLA Signed label Jun 22, 2023

dianaml0 reviewed Jun 23, 2023

View reviewed changes

dianaml0 approved these changes Jun 23, 2023

View reviewed changes

dianaml0 merged commit 31fba01 into facebookresearch:main Jun 23, 2023
1 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Register `weights` as a non-persistent buffer of `SinusoidalPositionalEmbedding` #5213

Register `weights` as a non-persistent buffer of `SinusoidalPositionalEmbedding` #5213

MaigoAkisame commented Jun 22, 2023 •

edited

dianaml0 Jun 23, 2023

MaigoAkisame Jun 23, 2023

dianaml0 commented Jun 23, 2023

MaigoAkisame commented Jun 23, 2023

MaigoAkisame commented Jun 23, 2023

dianaml0 left a comment

Register weights as a non-persistent buffer of SinusoidalPositionalEmbedding #5213

Register weights as a non-persistent buffer of SinusoidalPositionalEmbedding #5213

Conversation

MaigoAkisame commented Jun 22, 2023 • edited

Before submitting

What does this PR do?

PR review

Did you have fun?

dianaml0 Jun 23, 2023

Choose a reason for hiding this comment

MaigoAkisame Jun 23, 2023

Choose a reason for hiding this comment

dianaml0 commented Jun 23, 2023

MaigoAkisame commented Jun 23, 2023

MaigoAkisame commented Jun 23, 2023

dianaml0 left a comment

Choose a reason for hiding this comment

Register `weights` as a non-persistent buffer of `SinusoidalPositionalEmbedding` #5213

Register `weights` as a non-persistent buffer of `SinusoidalPositionalEmbedding` #5213

MaigoAkisame commented Jun 22, 2023 •

edited