Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] User side cancellation #179

Open
sunggg opened this issue Jan 30, 2024 · 3 comments
Open

[Bug] User side cancellation #179

sunggg opened this issue Jan 30, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@sunggg
Copy link
Member

sunggg commented Jan 30, 2024

User side cancellation does not take effect. We also need to log properly when it has been cancelled.

@sunggg sunggg added the bug Something isn't working label Jan 30, 2024
Lunderberg pushed a commit to Lunderberg/mlc-llm that referenced this issue Jan 30, 2024
This PR changes the GPT-NeoX KV cache creation function to create to
full size at the beginning, so no memory allocation will be required
when running on the fly.
@yelite
Copy link

yelite commented Jan 31, 2024

How is the user side cancellation triggered? When I tried by ctrl-c a running curl command, I can see the cancellation gets processed.

script:

payload='{                                                                                                                                                                                                 
  "model": "llama-2",                                                                                                                                                                                      
  "messages": [                                                                                                                                                                                            
      {                                                                                                                                                                                                    
        "role": "user",                                                                                                                                                                                    
        "content": "Hello! what is the answer to life, the universe, and everything? give me a long answer"                                                                                                
      }                                                                                                                                                                                                    
    ],                                                                                                                                                                                                     
  "max_tokens": 1000,                                                                                                                                                                                      
  "stream": true,                                                                                                                                                                                          
  "temperature": 1.0,                                                                                                                                                                                      
  "top_p": 1,                                                                                                                                                                                              
  "presence_penalty": 0,                                                                                                                                                                                   
  "frequency_penalty": 0                                                                                                                                                                                   
}'                                                                                                                                                                                                         
                                                                                                                                                                                                           
echo "======="                                                                                                                                                                                             
echo "Request"                                                                                                                                                                                             
echo "======="                                                                                                                                                                                             
echo "$payload" | jq                                                                                                                                                                                       
                                                                                                                                                                                                           
echo "========"                                                                                                                                                                                            
echo "Response"                                                                                                                                                                                            
echo "========"                                                                                                                                                                                            
                                                                                                                                                                                                           
curl -s -X 'POST' \                                                                                                                                                                                        
  'http://127.0.0.1:8000/v1/chat/completions' \                                                                                                                                                            
  -H 'accept: application/json' \                                                                                                                                                                          
  -H 'Content-Type: application/json' \                                                                                                                                                                    
  -H "Authorization: Bearer abc" \                                                                                                                                                                         
  -d "$payload"                                                                                                                                                                                                         

log:

2024-01-31 20:58:40 [info     ] StagingInferenceEngine.add     [mlc_serve.engine.staging_engine] func_name=add lineno=106 pathname=/opt/dlami/nvme/liteye/mlc-llm/serve/mlc_serve/engine/staging_engine.py process=2803754 requests=[Request(request_id='cmpl-71e9e27ce9f842108e3e820b1b6d63c8', messages=[ChatMessage(role='user', content='Hello! what is the answer to life, the universe, and everything? give me a long answer')], num_sequences=1, best_of=1, sampling_params=SamplingParams(presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, logit_bias=None, appeared_tokens_freq={}, logit_bias_index=None, logit_bias_value=None, logprobs=False, top_logprobs=0), stopping_criteria=StoppingCriteria(max_tokens=1000, stop_sequences=[]), debug_options=DebugOptions(ignore_eos=False, prompt=None, prompt_token_ids=None), validate_tokens=None, contextvars={})]
2024-01-31 20:58:40 [info     ] AsyncEngineConnector.generate iterator cancelled. [mlc_serve.engine.async_connector] func_name=generate lineno=90 pathname=/opt/dlami/nvme/liteye/mlc-llm/serve/mlc_serve/engine/async_connector.py process=2803754 request_id=cmpl-71e9e27ce9f842108e3e820b1b6d63c8
2024-01-31 20:58:40 [info     ] StagingInferenceEngine.cancel  [mlc_serve.engine.staging_engine] func_name=cancel lineno=133 pathname=/opt/dlami/nvme/liteye/mlc-llm/serve/mlc_serve/engine/staging_engine.py process=2803754 request_id=cmpl-71e9e27ce9f842108e3e820b1b6d63c8
2024-01-31 20:58:40 [info     ] AsyncEngineConnector.generate request sucessfully cancelled. [mlc_serve.engine.async_connector] func_name=generate lineno=93 pathname=/opt/dlami/nvme/liteye/mlc-llm/serve/mlc_serve/engine/async_connector.py process=2803754 request_id=cmpl-71e9e27ce9f842108e3e820b1b6d63c8
2024-01-31 20:58:40 [info     ] AsyncEngineConnector.generate removing request from result queue. [mlc_serve.engine.async_connector] func_name=generate lineno=98 pathname=/opt/dlami/nvme/liteye/mlc-llm/serve/mlc_serve/engine/async_connector.py process=2803754 request_id=cmpl-71e9e27ce9f842108e3e820b1b6d63c8

@sunggg
Copy link
Member Author

sunggg commented Jan 31, 2024

Hmm interesting. That is pretty much what I did. I was printing the all the token_ids and saw it kept printing with new tokens even after cancellation. Is it possible that the request is cancelled correctly but somehow keep printing from the buffer?

@yelite
Copy link

yelite commented Feb 1, 2024

Is it possible that the request is cancelled correctly but somehow keep printing from the buffer?

No it's not. If it's cancelled correctly, it shouldn't be able to print new tokens.

Can you show me your steps to trigger the problem? Then I can try to reproduce it on my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants