[Text Generation][V2] NonKVCachePipeline #1417

dbogunowicz · 2023-11-17T15:45:50Z

Feature Description

Added the TestGenerationPipelineNoKVCache. This pipeline processes the prompt and returns the new token. That's it.
Its main functionality is mapping prompt tokens to logits, instrumental for computing the perplexity of the model given a dataset

Testing

Updated the integration tests to cover the case of non-kv-cache inference.

Example Use

from deepsparse.v2.text_generation import TextGenerationPipelineNoCache

prompt = ["Some funny prompt", "Why are you so"]

pipeline = TextGenerationPipelineNoCache(model_path="hf:mgoin/TinyStories-1M-ds",
                                         onnx_model_name="model-orig.onnx",
                                         sequence_length=20)

out = pipeline(prompt=prompt,
               include_prompt_logits=True,
               generation_kwargs=dict(output_scores=True))

for gen in out.generations:
    print(gen)

text='.' score=array([[ 2.9344807 , -0.03345669, -4.11256   , ..., -6.9316325 ,
        -4.6005425 ,  1.1827914 ],
       [ 7.008805  , -0.11603884, -7.1837015 , ..., -7.0405912 ,
        -2.386351  , -2.2007818 ],
       [ 6.348213  , -2.2960157 , -6.433192  , ..., -6.5930486 ,
        -5.8315077 , -0.58804405],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ]], dtype=float32) finished=True finished_reason='length' # notice that logits get zero padding from the end, this is because all logits need to have the same shape (the length of the longest prompt in the input +1)
text=' sad' score=array([[ 2.560934 ,  1.1993233, -6.670935 , ..., -7.3002615, -3.823823 ,
         1.8125833],
       [-1.1050931, -2.4256568, -7.3015127, ..., -6.1500154, -4.074909 ,
         1.8155754],
       [ 6.172593 , -2.2252593, -9.146653 , ..., -7.70834  , -4.810748 ,
         0.3985293],
       [ 1.4988875,  1.0973434, -4.4714937, ..., -4.8026247, -1.1791464,
         1.6924176]], dtype=float32) finished=True finished_reason='length'

Next steps

Create a parentTextGenerationPipeline operator that can either choose to use the kv-cache or non-kv cache version of the pipeline, depending on the topology of the ONNX model
Move the overwriting of the transformer inputs to some high-level function
Use the V2 pipeline for Perplexity calculation
swap GraphRouter for LinearRouter in TextGenerationPipelineNoKVCache

src/deepsparse/v2/text_generation/pipeline.py

src/deepsparse/v2/text_generation/nl_engine_operator.py

src/deepsparse/utils/onnx.py

… router and image classification pipeline/operators/example (#1325) * initial functionality and working example with image classification * remove testing image * update args * initial functionality and working example with image classification * remove testing image * pr comments * defines schemas for operators and test * add image classification test, PR comments * fix input/output handling in pipeline and operator base classes to be more generic; remove context * add additional operator input message * typo fix

* [v2] EngineOperator updates to make continuous batching easier * test fixes

…ity (#1348) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings

…generation functionality (#1356) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * fix capacity settting again * typo fixes

dbogunowicz changed the base branch from main to v2 November 17, 2023 15:46

dbogunowicz changed the base branch from v2 to feature/damian/v2/factor_out_transformation_utils November 20, 2023 13:31

dbogunowicz changed the base branch from feature/damian/v2/factor_out_transformation_utils to v2 November 20, 2023 13:33

dbogunowicz force-pushed the feature/damian/no_kv_cache branch from a95d55a to fa96efb Compare November 20, 2023 13:58

dsikka requested changes Nov 21, 2023

View reviewed changes

src/deepsparse/v2/text_generation/pipeline.py Outdated Show resolved Hide resolved

src/deepsparse/v2/text_generation/pipeline.py Outdated Show resolved Hide resolved

src/deepsparse/v2/text_generation/nl_engine_operator.py Outdated Show resolved Hide resolved

dbogunowicz changed the title ~~[Text Generation][V2] NonKVCachePipeline~~ [WiP][Text Generation][V2] NonKVCachePipeline Nov 27, 2023

dbogunowicz marked this pull request as ready for review November 27, 2023 14:11

dbogunowicz changed the title ~~[WiP][Text Generation][V2] NonKVCachePipeline~~ [Text Generation][V2] NonKVCachePipeline Nov 28, 2023

dbogunowicz requested review from dsikka, bfineran and rahul-tuli November 28, 2023 07:47

dbogunowicz commented Dec 6, 2023

View reviewed changes

src/deepsparse/utils/onnx.py Outdated Show resolved Hide resolved

dbogunowicz force-pushed the feature/damian/no_kv_cache branch from 83d6cf1 to dcab3f9 Compare December 6, 2023 12:38

Base automatically changed from v2 to main December 6, 2023 15:37

bfineran and others added 6 commits December 18, 2023 16:08

Pipelines Refactor - Initial Impl (#1287)

aa18bac

[v2] EngineOperator updates to make continuous batching easier (#1371)

59fb587

* [v2] EngineOperator updates to make continuous batching easier * test fixes

initial commit

7f3eb12

dbogunowicz force-pushed the feature/damian/no_kv_cache branch from e0a9dee to 7f3eb12 Compare December 18, 2023 16:10

ready for reviews

0901a01

dbogunowicz closed this Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Text Generation][V2] NonKVCachePipeline #1417

[Text Generation][V2] NonKVCachePipeline #1417

dbogunowicz commented Nov 17, 2023 •

edited

[Text Generation][V2] NonKVCachePipeline #1417

[Text Generation][V2] NonKVCachePipeline #1417

Conversation

dbogunowicz commented Nov 17, 2023 • edited

Feature Description

Testing

Example Use

Next steps

dbogunowicz commented Nov 17, 2023 •

edited