Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Quantization Modifer and Reloading #2246

Merged
merged 50 commits into from May 6, 2024
Merged

Conversation

Satrat
Copy link
Contributor

@Satrat Satrat commented Apr 17, 2024

  • Adds new vLLMQuantizationModifier that supports the new framework in compressed-tensors
  • Adds support for loading a model quantized in the compressed-tensors framework
  • Testing scripts for comparing performance to old quantization setup
  • SparseGPT quantization support with new modifier

Testing

Added tests for:

  • comparing scale, zero points to old framework
  • comparing perplexities to old framework
  • testing reloaded model matches original

Were matching perplexity compared to the baseline with 2%, and not doing consistently worse

@Satrat Satrat changed the title [WIP] Refactor Quantization Modifier [WIP] Refactor OneShot Quantization Apr 19, 2024
@bfineran bfineran changed the title [WIP] Refactor OneShot Quantization Refactor OneShot Quantization Apr 22, 2024
@Satrat Satrat changed the base branch from main to feature/damian/sparsetensors April 22, 2024 20:44
@Satrat Satrat requested a review from dbogunowicz April 29, 2024 13:57
@Satrat Satrat changed the title Refactor OneShot Quantization Refactor Quantization Modifer and Reloading Apr 29, 2024
Base automatically changed from feature/damian/sparsetensors to main May 1, 2024 15:49
@Satrat Satrat requested a review from bfineran May 1, 2024 16:21
bfineran
bfineran previously approved these changes May 1, 2024
horheynm
horheynm previously approved these changes May 2, 2024
rahul-tuli
rahul-tuli previously approved these changes May 2, 2024
src/sparseml/modifiers/quantization_vllm/pytorch.py Outdated Show resolved Hide resolved
@Satrat Satrat dismissed stale reviews from rahul-tuli and horheynm via 2432cf4 May 2, 2024 18:50
@bfineran bfineran merged commit f7cb678 into main May 6, 2024
13 of 17 checks passed
@bfineran bfineran deleted the sa/quant_mod_refactor branch May 6, 2024 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants