Support saving and loading 8-bit block weights #273

mryab · 2023-02-25T15:35:24Z

This PR relies on TimDettmers/bitsandbytes#159 and makes it possible to call convert_model with the int8 data type and later on download the 8-bit checkpoint instead of 16-bit if serving the model with load_in_8bit=True. This can save up to 2x bandwidth on starting a server, as shown by this comparison of model sizes for bloom-560m:

~/petals$ du -sh converted_model*
802M    converted_model
515M    converted_model_int8

The command that was used for conversion is python -m petals.cli.convert_model --model bigscience/bloom-560m --output_path ./converted_model_int8 --torch_dtype int8 --resize_token_embeddings 50000 --block_branch_prefix int8_block. To test that the checkpoint loads correctly, you need to install bitsandbytes from the branch in the PR above and run python -m petals.cli.run_server bigscience/test-bloomd --new_swarm --skip_reachability_check --throughput 100 --device cuda (pay attention that I had to change BLOCK_BRANCH_PREFIX in this branch for the sake of testing).

mryab · 2023-02-25T15:35:49Z

src/petals/bloom/from_pretrained.py


 logger = get_logger(__name__)

 CLIENT_BRANCH = "main"
-BLOCK_BRANCH_PREFIX = "block_"
+BLOCK_BRANCH_PREFIX = "int8_block"


We'll roll that back before merging

mryab · 2023-02-25T15:36:41Z

src/petals/bloom/from_pretrained.py

+    if load_in_8bit:
+        block = replace_8bit_linear(block)
+        block = block.to(device)


I moved replace_8bit_linear here because it's not possible to correctly load the quantized Linear8bitLt checkpoint into the model before it's converted and quantized

mryab · 2023-02-25T15:37:52Z

src/petals/utils/convert_block.py

-    from petals.utils.linear8bitlt_patch import CustomLinear8bitLt
-
    for n, module in model.named_children():
        if len(list(module.children())) > 0:
            replace_8bit_linear(module, threshold)

        if isinstance(module, torch.nn.Linear) and n not in ["lm_head", "score"]:
            assert module.weight.device.type == "cpu", f"expected linear layers on CPU, got {module.weight.device}"
-            model._modules[n] = CustomLinear8bitLt(
+            model._modules[n] = bnb.nn.Linear8bitLt(


Not strictly necessary, but it'd be good to get rid of all bitsandbytes-related code that got into upstream before merging this

Done in #297.

justheuristic

Gentle reminder: please update BNB before merging. This is not covered by tests

borzunov · 2023-06-06T21:38:33Z

src/petals/bloom/from_pretrained.py

@@ -38,6 +39,8 @@ def load_pretrained_block(
    use_auth_token: Optional[str] = None,
    cache_dir: Optional[str] = None,
    max_disk_space: Optional[int] = None,
+    load_in_8bit=False,


Suggested change

load_in_8bit=False,

load_in_8bit: bool = False,

borzunov

Please defer this until #323 is merged, since it changes block loading code.

borzunov · 2023-08-03T00:33:23Z

We discussed that we may revive this feature for loading NF4-pre-quantized weights for Llama 2 and Stable Beluga 2.

mryab requested a review from justheuristic February 25, 2023 15:35

mryab commented Feb 25, 2023

View reviewed changes

justheuristic approved these changes Feb 26, 2023

View reviewed changes

justheuristic mentioned this pull request Feb 28, 2023

Update dependency versions #277

Closed

4 tasks

mryab added 4 commits April 18, 2023 14:27

Support saving and loading 8-bit block weights

b624f90

Fix formatting

d70019f

Set device_map only for int8

556f0fa

Remove load_in_8bit from convert_block

a610f4d

mryab force-pushed the download_8bit_weights branch from 56a3bee to a610f4d Compare April 18, 2023 12:30

borzunov reviewed Jun 6, 2023

View reviewed changes

borzunov requested changes Jun 8, 2023

View reviewed changes

borzunov force-pushed the main branch from dd9aa94 to ddcda02 Compare July 20, 2023 08:55

borzunov mentioned this pull request Aug 5, 2023

Swarm balancing logic issues #389

Open

ghost mentioned this pull request Aug 20, 2023

MeZo Forward Pass Implementation huggingface/peft#601

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support saving and loading 8-bit block weights #273

Support saving and loading 8-bit block weights #273

mryab commented Feb 25, 2023

mryab Feb 25, 2023

mryab Feb 25, 2023

mryab Feb 25, 2023

borzunov Mar 29, 2023

justheuristic left a comment •

edited

borzunov Jun 6, 2023

borzunov left a comment

borzunov commented Aug 3, 2023

Support saving and loading 8-bit block weights #273

Are you sure you want to change the base?

Support saving and loading 8-bit block weights #273

Conversation

mryab commented Feb 25, 2023

mryab Feb 25, 2023

Choose a reason for hiding this comment

mryab Feb 25, 2023

Choose a reason for hiding this comment

mryab Feb 25, 2023

Choose a reason for hiding this comment

borzunov Mar 29, 2023

Choose a reason for hiding this comment

justheuristic left a comment • edited

Choose a reason for hiding this comment

borzunov Jun 6, 2023

Choose a reason for hiding this comment

borzunov left a comment

Choose a reason for hiding this comment

borzunov commented Aug 3, 2023

justheuristic left a comment •

edited