Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] Add documentation on how to run fast inference of transformers models with ZeRO-3 #5498

Open
lewtun opened this issue May 3, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@lewtun
Copy link

lewtun commented May 3, 2024

Is your feature request related to a problem? Please describe.

Hello DeepSpeed team, while looking at how to accelerate text generation in TRL with ZeRO-3, we learned from @pacman100 that the most efficient method is to remove/add hooks within a context manager as follows:

@contextmanager
def unwrap_model_for_generation(
    model: Union["DistributedDataParallel", "DeepSpeedEngine"], accelerator: "Accelerator", is_peft_model: bool = False
) -> Union["PreTrainedModelWrapper", "DeepSpeedEngine"]:
    """Context manager to unwrap a model for generation.


    For ZeRO-3 models, we gather the weights once to speed up generation.
    """
    unwrapped_model = accelerator.unwrap_model(model)
    if is_peft_model:
        unwrapped_model.pretrained_model.disable_adapter()
    if accelerator.state.deepspeed_plugin is not None and accelerator.state.deepspeed_plugin.zero_stage == 3:
        with deepspeed.zero.GatheredParameters(model.parameters()):
            remove_hooks(model)
            yield model
            add_hooks(model)
    else:
        yield unwrapped_model

This works well for inference, but during DPO training, we hit a rather cryptic error only when gradient accumulation steps > 1:

AssertionError: {'id': 0, 'status': 'AVAILABLE', 'numel': 25755648, 'ds_numel': 25755648, 'shape': (50304, 512), 'ds_shape': (50304, 512), 'requires_grad': True, 'grad_shape': None, 'persist': False, 'active_sub_modules': {182}, 'ds_tensor.shape': torch.Size([3219456])}

The solution that @pacman100 found is that one needs to carefully remove all active parameters during the hook removal which has led to this fix in TRL: huggingface/trl#1617

Getting to the bottom of this issue was quite tricky and the DeepSpeed documentation unfortunately did not contain any guidance on how to do this. I'm sharing the issue here for broader visibility in case others are trying to speed up ZeRO-3 generation during training.

Describe the solution you'd like
An example in the documentation which shows how to run fast text generation with ZeRO-3 within a training loop. This is very useful for online methods like PPO.

Describe alternatives you've considered
N/A
Additional context
Add any other context or screenshots about the feature request here.

@lewtun lewtun added the enhancement New feature or request label May 3, 2024
@jomayeri jomayeri self-assigned this May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants