Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploying Models from Azure Blob #43

Open
ockaro opened this issue Feb 6, 2023 · 5 comments
Open

Deploying Models from Azure Blob #43

ockaro opened this issue Feb 6, 2023 · 5 comments

Comments

@ockaro
Copy link

ockaro commented Feb 6, 2023

Models which are located on the clearML servers (created by Task.init(..., output_uri=True) ) run perfectly while models which are located on azure blob storage produce different problems in different scenarios:

  1. start the docker container, add a model from the clearML server and afterwards add a model located on azure (on the same endpoint) -> no error, http requests are answered properly (but probably the model which was added first is used)
  2. start the docker container with no model added and first add a model from azure -> error: test_model_pytorch': failed to open text file for read /models/test_model_pytorch/config.pbtxt: No such file or directory .
  3. start the docker container where a model from azure was already added before -> error:
clearml-serving-triton        | Error retrieving model ID ca186e8440b84049971a0b623df36783 []
clearml-serving-triton        | Starting server: ['tritonserver', '--model-control-mode=poll', '--model-repository=/models', '--repository-poll-secs=60.0', '--metrics-port=8002', '--allow-metrics=true', '--allow-gpu-metrics=true']
clearml-serving-triton        | Traceback (most recent call last):
clearml-serving-triton        |   File "clearml_serving/engines/triton/triton_helper.py", line 540, in <module>
clearml-serving-triton        |     main()
clearml-serving-triton        |   File "clearml_serving/engines/triton/triton_helper.py", line 532, in main
clearml-serving-triton        |     helper.maintenance_daemon(
clearml-serving-triton        |   File "clearml_serving/engines/triton/triton_helper.py", line 274, in maintenance_daemon
clearml-serving-triton        |     raise ValueError("triton-server process ended with error code {}".format(error_code))
clearml-serving-triton        | ValueError: triton-server process ended with error code 1

Side note: The same problem occurs hosting the containers on windows and on linux. All azure credentials are succesfully set up as envioronment variables in 'clearml-serving-inference', 'clearml-serving-triton' and 'clearml-serving-statistics' containers.

@thepycoder
Copy link
Contributor

Hi There!

Thanks again for the detailed write-up. Would you mind testing if the following fix works? It seems like the clearml config file is not mounted inside the necessary containers. Make sure your Azure credentials are added in this config file :)

So you'd add:

    volumes:
      - $HOME/clearml.conf:/root/clearml.conf

to here:

clearml-serving-inference:

and here:

If you can confirm this is working, we can make a PR and get this issue sorted out. Thanks a lot for your patience and cooperation!!

@ockaro
Copy link
Author

ockaro commented Feb 17, 2023

Hi @thepycoder ,
thanks for your answer and sorry for my late reply. At least I managed to try your recommendations today and had the following findings on my local windows machine: (btw I am using the docker-compose-triton.yml not the GPU version)

  1. When I just added the volume like you suggested I got the error msg="The \"HOME\" variable is not set. Defaulting to a blank string." right after calling docker-compose. Setting the HOME environment variable did not work so I added it to the .env file which is passed in the docker-compose and got rid of the error.
  2. I then needed to manually confirm on a popup that the docker container is allowed to access the clearml.conf file. This was not really an issue for now but could be when running solely via terminal?
  3. Fortunately, I got a promising additional error message, everything else remained as before.
clearml-serving-triton        | E0217 10:21:25.908301 34 model_repository_manager.cc:2064] Poll failed for model directory 'test_model_pytorch': failed to open text file for read /models/test_model_pytorch/config.pbtxt: No such file or directory
clearml-serving-triton        | Info: syncing models from main serving service
clearml-serving-triton        | Updating local model folder: /models
clearml-serving-triton        | 2023-02-17 10:21:26,079 - clearml.storage - ERROR - Azure blob storage driver not found. Please install driver using: 'pip install clearml[azure]' or pip install '"azure.storage.blob>=12.0.0"'
clearml-serving-triton        | Error retrieving model ID 9075dbebef6d4467801da808a6e39570 []
clearml-serving-triton        | Info: Models updated from main serving service
clearml-serving-triton        | reporting metrics: relative time 123 sec
clearml-serving-inference     | Instance [3cf8c573a03e4341aa6f422465d5521b, pid=8]: New configuration updated
clearml-serving-inference     | ClearML results page: https://app.clear.ml/projects/c8794acd9c594f4e9f9a9a55b9b76632/experiments/3cf8c573a03e4341aa6f422465d5521b/output/log
clearml-serving-inference     | ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoringclearml-serving-inference   
clearml-serving-inference     | ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start

So it seems like the azure blob storage driver is not set up properly in the docker container? In the environment where I call docker-compose the requirement is already satisfied.

@thepycoder
Copy link
Contributor

Hey @ockaro!

Thanks for checking back in!

  1. Interesting, we'll take a look at this!
  2. Could this be docker on windows behaviour? Or was it a popup of ClearML? Because the whole serving stack doesn't have a UI, I would think the popup is from docker itself, which we can do little about (we should fix it though, by not needing you to mount it manually in the first place)
  3. Could you try adding the following in your docker-compose config under the triton container:
    CLEARML_EXTRA_PYTHON_PACKAGES="azure-storage-blob"
    This should install the blob storage for you. If this works, we'll add it to the default requirements :)

@ockaro
Copy link
Author

ockaro commented Feb 20, 2023

Hi @thepycoder,
thanks again for your reply.

  1. Yes it was a popup from docker so proper mounting will probably fix this.
  2. I tried your tip and it worked! Thanks for the kind and smooth handling of my issue. :)

Do you need any further information?

@thepycoder
Copy link
Contributor

@ockaro Awesome, thanks a lot for your patience here! We don't need anything else and are working to make the process more painless in the future. Thank you so much for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants