Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for Top Level Request Caching for Ensemble Models #7074

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

lkomali
Copy link

@lkomali lkomali commented Apr 5, 2024

Related PR: triton-inference-server/core#338

Added 4 new tests in L0_response_cache to test top level request caching for ensemble models
Test 1: When cache and decoupled enabled in ensemble model config: Error
Test 2: When cache enabled in ensemble model config and decoupled enabled in composing model: Error
Test 3: When cache enabled only in ensemble model: If cache hit = 0 or cache hit count doesn't exist - Fail
Expected: Non-zero cache hit count
Test 4: When cache enabled in all models: If cache hit = 0 or cache hit > inference count - Fail
Expected: Non zero cache hit count and cache hit count < successful inference count.

No modifications in L0_perf_analyzer_report test. Removed wait $SERVER_PID as is not returning/exiting with 0 because Triton server is taking a long time to shut down.

@lkomali lkomali changed the title Lkomali dlis 4626 tests Tests for Top Level Request Caching for Ensemble Models Apr 5, 2024
MODEL_LOAD_TYPE="${3}"
EXTRA_ARGS="--model-control-mode=${MODEL_CONTROL_MODE} --load-model=${MODEL_LOAD_TYPE}"
SERVER_ARGS="--model-repository=${MODEL_REPOSITORY} --cache-config local,size=1048576000 ${EXTRA_ARGS}"
source ../common/util.sh
Copy link
Collaborator

@rmccorm4 rmccorm4 Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved higher up and outside of the function - shouldn't need to keep re-sourcing the file in each function call.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually looks like it's already included higher up, so this can just be removed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the SERVER_ARGS defined outside the function as it is not used. This should be inside the function as ${MODEL_REPOSITORY} changes for decoupled_cache test case and ensemble cache testcase. Other testcases use different SERVER_ARGS.

@lkomali lkomali requested a review from rmccorm4 April 25, 2024 08:32
@rmccorm4 rmccorm4 requested review from Tabrizian and oandreeva-nv and removed request for dyastremsky April 29, 2024 18:06
@lkomali
Copy link
Author

lkomali commented Apr 30, 2024

working on refactoring tests according to offline discussion with @oandreeva-nv

@lkomali lkomali requested a review from oandreeva-nv May 1, 2024 21:13

def _run_inference_and_validate(self, model):
# Utility function to call run_ensemble for inference and validate expected baseline output and stats
self.triton_client.load_model(self.ensemble_model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the past argument be model and not self.ensemble_model ?

Copy link
Author

@lkomali lkomali May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter for _run_inference_and_validate should be model because the model can be ensemble model or composing model. The passed model's stats will be validated according to testcase.
In case of 3rd testcase which has response cache enabled only in composing model, the ensemble model stats will have empty fields for cache related metrics. That's the reason why I'm separately passing model parameter to define which model's stats to be verified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was mostly for load_model argument. It was a slightly confusing and not clear from the start why we do that, so that is why I asked to clarify test plan.

Copy link
Contributor

@oandreeva-nv oandreeva-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is in a very good state. Nice work on re-factoring since the original commit! Just a couple of small additions and we'll be able to merge it

model_stats = self.triton_client.get_inference_statistics(
model_name=model, as_json=True
)
return model_stats["model_stats"][1]["inference_stats"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment on why [1] here -> which model? If it's for a specific model, I would add some small assert that model_stats["model_stats"][1]["name"] equals the expected model you're checking for

Copy link
Author

@lkomali lkomali May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model have two versions version 1 and version 3. Version 1 stats are at index 0 and version 3 stats are index 1.
Version 3 is loaded. so to access it's stats, index is set to 1.
Added the comment in the test file too.

self._update_config(
self.composing_config_file, RESPONSE_CACHE_PATTERN, RESPONSE_CACHE_CONFIG
)
self._run_inference_and_validate(self.composing_model)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be running inference on the ensemble model here, and validating that ensemble did inference but has no cache stats, and that composing model does have correct cache stats? Looks like we're doing inference on composing model directly here so not actually testing the ensemble flow.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are running inference on ensemble model only. The model parameter is only to verify baseline stats in run_inference_and_validate. For this testcase, ensemble model stats are going to be empty. So I passed model as a parameter to correctly verify corresponding model's stats, for this testcase it's composing model.

TEST_RESULT_FILE='test_results.txt'
SERVER_LOG=./inference_server.log
RESET_CONFIG_FUNCTION="_reset_config_files"
CACHE_SIZE=10840
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment on size choice - why this number? what's the requirements/goal?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a random number. Something that worked for me.
For test_cache_insertion testcase, I used CACHE_SIZE=200 because the data to be inserted is more than 200. Found the value through calculations and testing.

Copy link
Collaborator

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good! Only minor comments

Helper function that takes model as a parameter to verify the corresponding model's stats
The passed model is composing model for test case `test_ensemble_composing_model_cache_enabled`
For other testcases, the top-level ensemble model stats are verified.
* loads the simple_graphdef_float32_float32_float32 and graphdef_float32_float32_float32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: clarify which of these models is an ensemble and which is not

Copy link
Collaborator

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work Harshini! 🚀

Can you add a link on slack to a CI pipeline running with all your latest changes to make sure everything looks good?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants