Refactor Quantization Modifer and Reloading #2246

Satrat · 2024-04-17T21:04:43Z

Adds new vLLMQuantizationModifier that supports the new framework in compressed-tensors
Adds support for loading a model quantized in the compressed-tensors framework
Testing scripts for comparing performance to old quantization setup
SparseGPT quantization support with new modifier

Testing

Added tests for:

comparing scale, zero points to old framework
comparing perplexities to old framework
testing reloaded model matches original

Were matching perplexity compared to the baseline with 2%, and not doing consistently worse

* working reload * sparsegpt

src/sparseml/modifiers/quantization_vllm/pytorch.py

…ml into sa/quant_mod_refactor

src/sparseml/modifiers/quantization_vllm/pytorch.py

src/sparseml/modifiers/obcq/utils/sgpt_wrapper.py

src/sparseml/modifiers/quantization_vllm/base.py

src/sparseml/modifiers/quantization_vllm/pytorch.py

tests/sparseml/transformers/compression/test_compress_tensor_utils.py

dbogunowicz and others added 12 commits April 8, 2024 11:38

initial commit

097bd79

update setup.py

76970e3

Update setup.py

bbf4b39

fix setup.py

a272a30

move all config to sparsetensors

c0d3ead

Merge branch 'main' into feature/damian/sparsetensors

b3f7ff3

cleanup class name and comments

a75f8da

Merge branch 'main' into feature/damian/sparsetensors

c5b897e

initial implementation untested

2c72ab1

fixing issues

9174c1d

add test script

aa17e77

update perplexity test

f1f114c

Satrat requested review from bfineran and horheynm April 17, 2024 21:04

dbogunowicz and others added 5 commits April 18, 2024 11:04

refactor to compressed-tensors

bbbdcb9

Merge branch 'main' into feature/damian/sparsetensors

5d9c7dd

rename sparsetensors

7a9f9e5

update setup

fa43088

Sa/model reload (#2250)

63266d8

* working reload * sparsegpt

Satrat changed the title ~~[WIP] Refactor Quantization Modifier~~ [WIP] Refactor OneShot Quantization Apr 19, 2024

Satrat mentioned this pull request Apr 19, 2024

Pretrained Model Reload + SparseGPT Support neuralmagic/compressed-tensors#31

Merged

Satrat requested review from rahul-tuli, dbogunowicz and dsikka April 19, 2024 20:10

bfineran changed the title ~~[WIP] Refactor OneShot Quantization~~ Refactor OneShot Quantization Apr 22, 2024

Satrat added 2 commits April 22, 2024 16:28

Merge branch 'main' into sa/quant_mod_refactor

b0f0fc9

Merge branch 'main' into feature/damian/sparsetensors

dfa41fb

Satrat changed the base branch from main to feature/damian/sparsetensors April 22, 2024 20:44

Satrat added 2 commits April 22, 2024 20:53

Merge branch 'feature/damian/sparsetensors' into sa/quant_mod_refactor

4af4852

cleanup

55976c5

Satrat added 4 commits April 25, 2024 20:50

Merge branch 'feature/damian/sparsetensors' into sa/quant_mod_refactor

b9b684c

address PR comments

f4362cf

can't repeat freeze

ca91c4f

UX pr comments

c894305

Satrat requested a review from dbogunowicz April 29, 2024 13:57

Satrat changed the title ~~Refactor OneShot Quantization~~ Refactor Quantization Modifer and Reloading Apr 29, 2024

bfineran reviewed Apr 29, 2024

View reviewed changes

src/sparseml/modifiers/quantization_vllm/pytorch.py Show resolved Hide resolved

Base automatically changed from feature/damian/sparsetensors to main May 1, 2024 15:49

Satrat added 2 commits May 1, 2024 11:54

Merge branch 'main' into sa/quant_mod_refactor

1c3b31b

quality

90795bd

Satrat requested a review from bfineran May 1, 2024 16:21

bfineran previously approved these changes May 1, 2024

View reviewed changes

horheynm added 2 commits May 1, 2024 21:20

shape consistency

bf7d0f6

Merge branch 'sa/quant_mod_refactor' of github.com:neuralmagic/sparse…

579d201

…ml into sa/quant_mod_refactor

horheynm dismissed bfineran’s stale review via 579d201 May 1, 2024 21:21

Satrat requested a review from bfineran May 2, 2024 14:18

horheynm previously approved these changes May 2, 2024

View reviewed changes

rahul-tuli previously approved these changes May 2, 2024

View reviewed changes

src/sparseml/modifiers/quantization_vllm/pytorch.py Outdated Show resolved Hide resolved

dbogunowicz reviewed May 2, 2024

View reviewed changes

address PR comments

2432cf4

Satrat dismissed stale reviews from rahul-tuli and horheynm via 2432cf4 May 2, 2024 18:50

Satrat requested review from horheynm, rahul-tuli and dbogunowicz May 2, 2024 18:50

rahul-tuli approved these changes May 2, 2024

View reviewed changes

bfineran approved these changes May 3, 2024

View reviewed changes

Merge branch 'main' into sa/quant_mod_refactor

161d1ee

bfineran merged commit f7cb678 into main May 6, 2024
13 of 17 checks passed

bfineran deleted the sa/quant_mod_refactor branch May 6, 2024 20:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Quantization Modifer and Reloading #2246

Refactor Quantization Modifer and Reloading #2246

Satrat commented Apr 17, 2024 •

edited

Refactor Quantization Modifer and Reloading #2246

Refactor Quantization Modifer and Reloading #2246

Conversation

Satrat commented Apr 17, 2024 • edited

Testing

Satrat commented Apr 17, 2024 •

edited