Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance OTEL testing to capture and verify Cancellation Requests and Non-Decoupled model inference. #7132

Merged
merged 18 commits into from May 24, 2024

Conversation

indrajit96
Copy link
Contributor

@indrajit96 indrajit96 commented Apr 18, 2024

Added tests for

  1. For request cancellations, make sure spans are ended properly in various scenarios
  2. Verify tracing behavior, after this PR: Allow non-decoupled model to send response and FINAL flag separately #6017

Tests added

  1. test_non_decoupled() : Verify tracing for a non-decoupled inference
  2. test_grpc_trace_all_input_required_model_cancel() : Verify trace after an inference request is cancelled in COMPUTE Phase
  3. test_grpc_trace_model_cancel_in_queue() : Verify trace after an inference request is cancelled in QUEUE before COMPUTE Phase

@indrajit96 indrajit96 changed the title Tests for Enhance OTEL testing DLIS-6308 Apr 18, 2024
@oandreeva-nv
Copy link
Contributor

Could you please add a description, clarifying what tests were added?

@oandreeva-nv
Copy link
Contributor

Could you please attach a picture of a trace as displayed in jaeger with cancelled request.

Another question, have you considered cases when request was cancelled, when it was in a queue and when it was already in a compute stage?

@indrajit96
Copy link
Contributor Author

Could you please attach a picture of a trace as displayed in jaeger with cancelled request.

Another question, have you considered cases when request was cancelled, when it was in a queue and when it was already in a compute stage?

Could you please attach a picture of a trace as displayed in jaeger with cancelled request.

Another question, have you considered cases when request was cancelled, when it was in a queue and when it was already in a compute stage?

Screenshot 2024-05-02 at 10 51 00 AM

Fixed this, added a model which waits for a delay before executing hence cancellation request is recieved before the execution starts

@rmccorm4
Copy link
Collaborator

rmccorm4 commented May 8, 2024

Can you update the PR title to be more descriptive? (cancellation, decoupled responses, etc. rather than JIRA ticket number)

@@ -339,6 +389,22 @@ def _verify_headers_propagated_from_client_if_any(self, root_span, headers):
),
)

def _test_trace_cancel(self, is_queued):
# TO accommodate for delay in the model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify what delay in the model? and if it is a model related, why are we using COLLECTOR_TIMEOUT variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to capture a cancellation request traces WHILE the inference is in the COMPUTE stage.
Because the model "input_all_required" has a delay/wait in the compute phase so the cancellation request can be send while the request is waiting in the compute phase.
The idea here is to wait before we try and read the traces from the file.

If we do not wait, the file in empty without traces.
Updated the comments in the test to reflect his better.

@oandreeva-nv
Copy link
Contributor

I think it's in a good state. Let's address Ryan's comment and provide a clear description for 3 tests, added in this PR, to the PR description

@indrajit96 indrajit96 changed the title Enhance OTEL testing DLIS-6308 Enhance OTEL testing to capture and verify Cancellation Requests and Non-Decoupled model inference. May 21, 2024
Copy link
Contributor

@oandreeva-nv oandreeva-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for this hard work!

@indrajit96 indrajit96 merged commit 76e53c9 into main May 24, 2024
3 checks passed
@indrajit96 indrajit96 deleted the indrajit_otel_testing branch May 24, 2024 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants