You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for stopping by to let us know something could be better!
PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.
Is your feature request related to a problem? Please describe.
When deploying a model using aiplatform.Model.uploadand aiplatform.Model.deploy storage is limited to container's default,
so when using VLLM (e.g) with a big model that uses ray for cluster managment and there's memory spillage onto /tmp/ directory, that directory fills up to the maximum and the model crashes..
can't go around it..
This happened to me when trying to deploy llama70b on 8L4 gpus with vllm Describe the solution you'd like
Not sure about what's possible, but ultimately i'd like anothoer argument to the upload function that's something like "serving_container_tmp_dir_capacity_mb" or "serving_container_volume_mapping: list[dict[host_path,container_path]]" or "serving_container_mount_root_external_gcs_bucket: str (a gcs bucket that / is mounted on if possible)"
Describe alternatives you've considered
I've tried approaching the error from ray/vllm, setting the spill directory to be external, did not work.. Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered:
Thanks for stopping by to let us know something could be better!
PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.
Is your feature request related to a problem? Please describe.
When deploying a model using
aiplatform.Model.upload
and aiplatform.Model.deploy
storage is limited to container's default,so when using VLLM (e.g) with a big model that uses ray for cluster managment and there's memory spillage onto
/tmp/
directory, that directory fills up to the maximum and the model crashes..can't go around it..
This happened to me when trying to deploy llama70b on 8L4 gpus with vllm
Describe the solution you'd like
Not sure about what's possible, but ultimately i'd like anothoer argument to the
upload
function that's something like "serving_container_tmp_dir_capacity_mb" or "serving_container_volume_mapping: list[dict[host_path,container_path]]" or "serving_container_mount_root_external_gcs_bucket: str (a gcs bucket that / is mounted on if possible)"Describe alternatives you've considered
I've tried approaching the error from ray/vllm, setting the spill directory to be external, did not work..
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: