[P2] Multi-GPU model sharding with intervening evaluation and training #54

frankaging · 2024-01-15T23:15:14Z

Descriptions:

The library is not tested with multi-GPU use cases. We assume the intervening model can be loaded into a single GPU. This is not ideal for interventions on 70B models, for instance. We want to be able to load the model into multiple GPUs using sharding.

Static interventions need to be attached to the right component on the right machine in case of model sharing. Training interventions need to be mapped onto the right machine where the corresponding model component lives as well.

This could be a large task. The first step is clear: try out static interventions (e.g., vanilla interventions) when models are loaded into multiple GPUs during inference time.

frankaging changed the title ~~[P2] Multi-GPU intervening evaluation and training~~ [P2] Multi-GPU model sharing with intervening evaluation and training Jan 17, 2024

frankaging changed the title ~~[P2] Multi-GPU model sharing with intervening evaluation and training~~ [P2] Multi-GPU model sharding with intervening evaluation and training Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P2] Multi-GPU model sharding with intervening evaluation and training #54

[P2] Multi-GPU model sharding with intervening evaluation and training #54

frankaging commented Jan 15, 2024

[P2] Multi-GPU model sharding with intervening evaluation and training #54

[P2] Multi-GPU model sharding with intervening evaluation and training #54

Comments

frankaging commented Jan 15, 2024