Speed up model loading for generate #709

albanD · 2024-04-13T23:46:08Z

This has not been extensively tested (only mistral 7b) and more of a proposal!

This change does the follow:

Create the model on the meta device
Load the state dict with assign=True which preserve the properties of the checkpoint (mmap-ed cpu Tensor in this case)
Initialize non-persistent buffers remaining on the meta device
Move the finalized model to the requested device/dtype

This makes the model loading almost instant on my machine.

pytorch-bot · 2024-04-13T23:46:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/709

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit add24c6 with merge base ada5224 ():

NEW FAILURE - The following job has failed:

Lint / lint (3.10) (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

albanD · 2024-04-14T01:21:34Z

Any way I can know from the CI logs what is my lint mistake so I can fix it?

kartikayk · 2024-04-14T03:22:07Z

@albanD thanks so much for putting this up! I'll take a more detailed look tomorrow, but to answer your lint question - you can do the following:

pre-commit install
pre-commit run --all-files

This will fix all of the issues for you.

ebsmothers · 2024-04-14T18:44:29Z

Thanks for the PR @albanD! Tbh we have already had a fraught relationship with meta device initialization 😅 (see e.g. #317, #418, #514). Our latest status is that we deliberately sacrifice a bit on time-to-first-batch for the sake of keeping code in the model components agnostic to meta device. But generation is an interesting case since the total runtime is much lower than on a finetune with FSDP (which is what we were focusing on previously). Out of curiosity, what is the speedup of meta device vs just initializing directly on GPU in this case?

albanD · 2024-04-15T18:32:20Z

I would need to check once I go back on the machine in question.
The more important bit tbh is that the CPU model was fully using the mmap-ed loaded Tensors and so was not blowing up my scarse RAM :D

@kartikayk I saw that but I don't have pre-commit in my environment :p

jerryzh168 · 2024-04-17T05:48:04Z

recipes/generate.py

            model = config.instantiate(model_cfg)

+        model.load_state_dict(model_state_dict, assign=True)


for quantized models we'd need to load after we do quantization I think

We should consider doing this https://docs-preview.pytorch.org/pytorch/tutorials/2824/recipes/recipes/swap_tensors.html?highlight=swap_tensor once we can use 2.3+ in AO/here.

albanD added 2 commits April 13, 2024 19:39

Speed up loading for generate

c122b27

add reset method to KVCache

add24c6

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 13, 2024

jerryzh168 reviewed Apr 17, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up model loading for generate #709

Speed up model loading for generate #709

albanD commented Apr 13, 2024

pytorch-bot bot commented Apr 13, 2024 •

edited

albanD commented Apr 14, 2024

kartikayk commented Apr 14, 2024

ebsmothers commented Apr 14, 2024

albanD commented Apr 15, 2024

jerryzh168 Apr 17, 2024

albanD Apr 17, 2024

		model = config.instantiate(model_cfg)

		model.load_state_dict(model_state_dict, assign=True)

Speed up model loading for generate #709

Are you sure you want to change the base?

Speed up model loading for generate #709

Conversation

albanD commented Apr 13, 2024

pytorch-bot bot commented Apr 13, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/709

❌ 1 New Failure

albanD commented Apr 14, 2024

kartikayk commented Apr 14, 2024

ebsmothers commented Apr 14, 2024

albanD commented Apr 15, 2024

jerryzh168 Apr 17, 2024

Choose a reason for hiding this comment

albanD Apr 17, 2024

Choose a reason for hiding this comment

pytorch-bot bot commented Apr 13, 2024 •

edited