Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource exhausted error #1740

Closed
lminer opened this issue Dec 24, 2020 · 24 comments · Fixed by #1769
Closed

Resource exhausted error #1740

lminer opened this issue Dec 24, 2020 · 24 comments · Fixed by #1769
Assignees
Labels
bug Something isn't working
Milestone

Comments

@lminer
Copy link

lminer commented Dec 24, 2020

I'm trying to send audio files, which are fairly large, to the server and am getting a resource exhausted error. Is there any way to configure the server in order to increase the maximum allowed message size?

Here's the stack trace:

2020-12-24 23:30:14.941839:cortex:pid-2247:INFO:500 Internal Server Error POST /
2020-12-24 23:30:14.942071:cortex:pid-2247:ERROR:Exception in ASGI application
Traceback (most recent call last):
  File "/opt/conda/envs/env/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line
390, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/applications.py", line 181, in __call__
    await super().__call__(scope, receive, send)  # pragma: no cover
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/applications.py", line 111, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 187, in parse_payload
    return await call_next(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 134, in register_request
    response = await call_next(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
 File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 566, in __call__
    await route.handle(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 227, in handle
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 41, in app
    response = await func(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/routing.py", line 183, in app
    dependant=dependant, values=values, is_coroutine=is_coroutine
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/routing.py", line 135, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/concurrency.py", line 34, in run_in_threadpool
    return await loop.run_in_executor(None, func, *args)
  File "/opt/conda/envs/env/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 200, in predict
    prediction = predictor_impl.predict(**kwargs)
  File "/mnt/project/serving/cortex_server.py", line 10, in predict
    return self.client.predict({"waveform": np.array(payload["audio"]).astype("float32")})
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/client/tensorflow.py", line
114, in predict
    return self._run_inference(model_input, consts.SINGLE_MODEL_NAME, model_version)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/client/tensorflow.py", line
164, in _run_inference
    return self._client.predict(model_input, model_name, model_version)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/model/tfs.py", line 376, in
predict
    response_proto = self._pred.Predict(prediction_request, timeout=timeout)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/grpc/_channel.py", line 826, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.RESOURCE_EXHAUSTED
        details = "Received message larger than max (102484524 vs. 4194304)"
        debug_error_string = "{"created":"@1608852614.937822193","description":"Received message larger
than max (102484524 vs. 4194304)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":203,"grpc_status":8}"
@lminer lminer added the bug Something isn't working label Dec 24, 2020
@RobertLucian
Copy link
Member

RobertLucian commented Dec 25, 2020

@lminer this looks like a problem with grpc.

Looking at this, it appears that this limit can be configured. I raised the limit to 128 MB on the 0.25-fix-tfs-lengths branch specifically for you.

To test with the change from the above branch, you only need to modify your API spec (assuming you use the cortex.yaml) to use the quay.io/robertlucian/tensorflow-predictor:0.25.0-tfs image. To do this, set the predictor.image field to quay.io/robertlucian/tensorflow-predictor:0.25.0-tfs and let us know if this worked for you.

For more context, this is the commit we did for the above solution.

Needless to say, this is a temporary solution. If this works for you, then we'll address this in one of the next releases. Please let us know if this worked for you.

@lminer
Copy link
Author

lminer commented Dec 28, 2020

@RobertLucian thanks for this! Do you have a version with cuda 11.0 and cudnn 8? I'm testing this out for deployment on a GPU machine.

@RobertLucian
Copy link
Member

RobertLucian commented Dec 28, 2020

@lminer yes, we have a version with CUDA 11.0 and cuDNN 8, but it's for the Python Predictor image.

For the TensorFlow Predictor, there are 2 images required: one for the TFS server (which needs to have access to the GPU device) and one for the API server. As of this moment, the TFS server is configured to use version 10.1 of CUDA and (presumably version 7 of cuDNN). Here's its Dockerfile for the GPU version:

FROM tensorflow/serving:2.3.0-gpu
RUN apt-get update -qq && apt-get install -y --no-install-recommends -q \
libnvinfer6=6.0.1-1+cuda10.1 \
libnvinfer-plugin6=6.0.1-1+cuda10.1 \
curl \
&& apt-get clean -qq && rm -rf /var/lib/apt/lists/*
COPY images/tensorflow-serving-gpu/run.sh /src/
RUN chmod +x /src/run.sh
ENTRYPOINT ["/src/run.sh"]

Let us know if you want to follow this path, in which case, we can help you along the way!

@lminer
Copy link
Author

lminer commented Dec 28, 2020

Up to now, I've been trying to use the Tensorflow Predictor path. Do you have any TFS and tensorflow predictor images for tensorflow 2.4.0 (CUDA 11.0, CuDNN 8) that work with a higher limit? Here's my cortex.yaml:

- name: foo
  kind: RealtimeAPI
  predictor:
    type: tensorflow
    path: serving/cortex_server.py
    models:
      path: ../foo/
      signature_key: serving_default
    image: quay.io/cortexlabs/python-predictor-gpu-slim:0.25.0-cuda11.0-cudnn8
    tensorflow_serving_image: quay.io/cortexlabs/tensorflow-serving-gpu:0.25.0
  compute:
    gpu: 1

@RobertLucian
Copy link
Member

@lminer one immediate problem I notice with your cortex.yaml is that for the TensorFlow Predictor, you must have the tensorflow-predictor image in for the image field and the tensorflow-serving-* image for set for the tensorflow_serving_image field.

In your case, it seems to be set to the python-predictor-gpu-slim, which is not the way to configure this. The naming scheme we have come up with for the images is related to the predictor type that the user has specified in the API spec. Therefore, python-predictor-gpu implies that's an image to be used for the Python Predictor type.


Now, let's re-reference this

FROM tensorflow/serving:2.3.0-gpu
RUN apt-get update -qq && apt-get install -y --no-install-recommends -q \
libnvinfer6=6.0.1-1+cuda10.1 \
libnvinfer-plugin6=6.0.1-1+cuda10.1 \
curl \
&& apt-get clean -qq && rm -rf /var/lib/apt/lists/*
COPY images/tensorflow-serving-gpu/run.sh /src/
RUN chmod +x /src/run.sh
ENTRYPOINT ["/src/run.sh"]

This is the Dockerfile for the tensorflow-serving-gpu image, which is automatically set for you in the API spec (when deploying). The current one runs on TF 2.3.0, which from TensorFlow's docs, runs on CUDA 10.1 & cuDNN 7. If we upgrade this to TF 2.4.0, it will give you CUDA 11 and cuDNN 8.

The upgraded Dockerfile would look like this:

FROM tensorflow/serving:2.4.0-gpu

RUN apt-get update -qq && apt-get install -y --no-install-recommends -q \
        libnvinfer7=7.1.3-1+cuda11.0 \
        libnvinfer-plugin7=7.1.3-1+cuda11.0 \
        curl \
    && apt-get clean -qq && rm -rf /var/lib/apt/lists/*

COPY images/tensorflow-serving-gpu/run.sh /src/
RUN chmod +x /src/run.sh

ENTRYPOINT ["/src/run.sh"]

I already built this image for your - you can pull it from quay.io/robertlucian/cortex-tensorflow-serving-gpu-tf2.4:0.25.0. You then need to update the tensorflow_serving_image field in your API spec. In case you want to re-do this by yourself, here's how you can do it:

  1. Clone the cortex repo and checkout the 0.25 branch (with git clone https://github.com/cortexlabs/cortex.git, cd cortex, and git checkout 0.25)
  2. Build the image by running ./build/build-image.sh tensorflow-serving-gpu.
  3. Choose a docker registry (such as Docker Hub or Quay) and create a public image repository on it (for me, I created quay.io/robertlucian/cortex-tensorflow-serving-gpu-tf2.4
  4. Log into your docker registry using docker login with the appropriate credentials
  5. Run docker tag quay.io/cortexlabs/tensorflow-serving-gpu:0.25.0 quay.io/robertlucian/cortex-tensorflow-serving-gpu-tf2.4:0.25.0 (replace the registry URL with yours)
  6. Run docker push quay.io/robertlucian/cortex-tensorflow-serving-gpu-tf2.4:0.25.0 (replace the registry URL with yours)
  7. Update your cortex.yaml to point to your newly pushed image.

Let me know if this worked for you!

@lminer
Copy link
Author

lminer commented Dec 29, 2020

@RobertLucian Thanks so much for this! What do I use for the image field? I tried the standard tensorflow-predictor image and GPU doesn't work. If I keep the python-predictor-gpu image, I get the same issue as before with the file size being too large.

@RobertLucian
Copy link
Member

@lminer you would not need to set the image field at all as it will be picking up its adequate image by default. Just to get this clear, the tensorflow-predictor image (the one that goes for the image field) has the API serving stuff whereas the tensorflow-serving-* images (the ones that go for the tensorflow_serving_image field) hold the inference engine.

That means that inferences will take place in the container represented by the tensorflow_serving_image image - and the way to go about it is to use the tensorflow_client (that's passed into the predictor's constructor) to run inferences. Not in any other way. If you try to run inferences inside the actual Tensorflow Predictor implementation using the GPU, that will not work as that is not the inference engine.

Could you also share your TensorFlow Predictor implementation? I might be able to give you a few pointers if required.

@lminer
Copy link
Author

lminer commented Dec 30, 2020

Ah. I didn't quite understand the breakdown of responsibilities there. I guess the CuDNN not found warning I saw was a red herring as the api doesn't use the GPU anyway. That being said, it seems as if I'm still seeing the error around resource exhaustion. It does look like the GPU is being used before the error, so maybe the exhaustion is occurring when the server is returning large values.

Here's my current cortex.yaml

- name: foo
  kind: RealtimeAPI
  predictor:
    type: tensorflow
    path: serving/cortex_server.py
    models:
      path: ../foo/
      signature_key: serving_default
    tensorflow_serving_image: quay.io/robertlucian/cortex-tensorflow-serving-gpu-tf2.4:0.25.0
  compute:
    gpu: 1

Here's my tensorflow predictor implementation:

import numpy as np


class TensorFlowPredictor:
    def __init__(self, tensorflow_client, config):
        self.client = tensorflow_client
        self.config = config

    def predict(self, payload, query_params, headers):
        target, residual = self.client.predict(
            {"waveform": np.array(payload["audio"]).astype("float32")}
        )
        return {"target": target.numpy().tolist(), "residual": residual.numpy().tolist()}

And here's the error again:

2020-12-30 22:49:04.428932:cortex:pid-1558:INFO:500 Internal Server Error POST /
2020-12-30 22:49:04.429214:cortex:pid-1558:ERROR:Exception in ASGI application
Traceback (most recent call last):
  File "/opt/conda/envs/env/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line
390, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/applications.py", line 181, in __call__
    await super().__call__(scope, receive, send)  # pragma: no cover
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/applications.py", line 111, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 187, in parse_payload
    return await call_next(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 134, in register_request
    response = await call_next(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 566, in __call__
    await route.handle(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 227, in handle
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 41, in app
    response = await func(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/routing.py", line 183, in app
    dependant=dependant, values=values, is_coroutine=is_coroutine
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/routing.py", line 135, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/concurrency.py", line 34, in run_in_threadpool
    return await loop.run_in_executor(None, func, *args)
  File "/opt/conda/envs/env/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 200, in predict
    prediction = predictor_impl.predict(**kwargs)
  File "/mnt/project/serving/cortex_server.py", line 11, in predict
    {"waveform": np.array(payload["audio"]).astype("float32")}
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/client/tensorflow.py", line
114, in predict
    return self._run_inference(model_input, consts.SINGLE_MODEL_NAME, model_version)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/client/tensorflow.py", line
164, in _run_inference
    return self._client.predict(model_input, model_name, model_version)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/model/tfs.py", line 376, in
predict
    response_proto = self._pred.Predict(prediction_request, timeout=timeout)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/grpc/_channel.py", line 826, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.RESOURCE_EXHAUSTED
        details = "Received message larger than max (102484524 vs. 4194304)"
        debug_error_string = "{"created":"@1609368544.425429771","description":"Received message larger
than max (102484524 vs. 4194304)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":203,"grpc_status":8}"
>

@lminer
Copy link
Author

lminer commented Jan 4, 2021

@RobertLucian Any idea what's going on. Would love to get this working.

@RobertLucian
Copy link
Member

RobertLucian commented Jan 5, 2021

@lminer your predictor.image also has to be set to quay.io/robertlucian/tensorflow-predictor:0.25.0-tfs, so your cortex.yaml will look like:

- name: foo
  kind: RealtimeAPI
  predictor:
    type: tensorflow
    ...
    image: quay.io/robertlucian/tensorflow-predictor:0.25.0-tfs
    tensorflow_serving_image: quay.io/robertlucian/cortex-tensorflow-serving-gpu-tf2.4:0.25.0
  compute:
    gpu: 1

Remember that the predictor.image is required to fix the original error "Received message larger than max (102484524 vs. 4194304)". predictor.image isn't required if all you want to get is version 2.4.0 of TensorFlow on your TFS.

If you've already done this and it still doesn't work, then maybe you can also share the model with me (you can email me at robert@cortexlabs.com to keep this private) so I can test that as well.

@lminer
Copy link
Author

lminer commented Jan 5, 2021

I'm still getting the same error message when I add the new predictor.image. I'd rather not share the model if possible. Any other ideas? It seems as if the max message size isn't being raised somewhere.

<_InactiveRpcError of RPC that terminated with:
        status = StatusCode.RESOURCE_EXHAUSTED
        details = "Received message larger than max (102484524 vs. 4194304)"
        debug_error_string = "{"created":"@1609868017.138526542","description":"Received message larger
than max (102484524 vs. 4194304)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":203,"grpc_status":8}"
>
2021-01-05 17:33:37.145639:cortex:pid-1558:INFO:200 OK POST /

@RobertLucian
Copy link
Member

@lminer I see. We will look into this then. I'll keep you posted as we look into this. Thanks for bringing this up to us.

@RobertLucian
Copy link
Member

@lminer we have fixed this error in #1769. The fix is present on a customized version of 0.25 specifically made for you - this change is now available in quay.io/robertlucian/tensorflow-predictor:0.25.0-tfs image. Your API spec should have this layout:

- name: foo
  kind: RealtimeAPI
  predictor:
    type: tensorflow
    ...
    image: quay.io/robertlucian/tensorflow-predictor:0.25.0-tfs
    ...

This fix will get into 0.27, or if you need this in 0.26, we can make a patch release for you.


For reference, I had this tested on a sound recognizer model and big inputs/outputs were used (up to 256 MB). Also, the grpc limit has been increased to 256 MB. Do you think this will get higher than this? We may also consider making this limit configurable in the API spec sometime down the road or sooner if it's really required.

@lminer
Copy link
Author

lminer commented Jan 7, 2021

Thanks, it works! I don't anticipate getting higher than 256 MB. Do you have a tentative timeline for when 0.27 is coming out?

@RobertLucian
Copy link
Member

@lminer that's great news! We plan to release 0.27 on the 19th of January (about 2 weeks from now).

@lminer
Copy link
Author

lminer commented Jan 10, 2021

@RobertLucian Have you looked into whether there's any slowdown incurred by passing such a large input to tensorflow-serving? I'm currently trying to figure out why inference is so slow. Right now inference takes 3 minutes and for most of that time both the CPU and GPU are idle. GPU is running for at most 6 seconds during that time. It makes me wonder whether that limit was put there for a reason.

@RobertLucian
Copy link
Member

RobertLucian commented Jan 11, 2021

@lminer I did not. But I remember trying with 140MB payloads and for something like #1770 it would take a few seconds to run the inference - what I can recall was ~10 seconds. This was done on t3.mediums, which are pretty slow anyway.

Assuming there's something wrong with grpc + TFS, I wonder if this would be in any way related to tensorflow/serving#1725.

I think to rule out a problem with grpc + TFS is to run the model as a PythonPredictor - that is you implement the loading procedure in the predictor's constructor and then you run the prediction in the predict method. I would only go with processes_per_replica set to 1 in this case because of https://docs.cortex.dev/v/0.24/running-on-aws/gpu#tips.

@lminer
Copy link
Author

lminer commented Jan 11, 2021

@RobertLucian I've been trying to get this working with the python predictor, but I think I'm having an issue where there is a confusion because the session information from when I load the model in the constructor is being lost at predict time. The solution seems to be to convert everything to graph mode. Unfortunately, this isn't so easy given my code. Do you know any other way to get around this error?

@RobertLucian
Copy link
Member

@lminer can you tell us if you're running multiple threads per process threads_per_process? You should set it to 1 when the TensorFlow framework is used with the PythonPredictor because of the way the framework works. A long time ago, it didn't work at all and so we had a patch that made the TensorFlow framework work using the Python Predictor when thread_per_process is set to 1 - and if I recall correctly, for processes_per_replica too.

If setting the threads_per_process field to 1 doesn't fix it, then maybe downgrading the TensorFlow version might help - assuming the model can be loaded with said version.

@lminer
Copy link
Author

lminer commented Jan 12, 2021

Everything is set to the default: 1. Do you know what version I would need to downgrade to? I'm currently on 2.4.0

@lminer
Copy link
Author

lminer commented Jan 12, 2021

@RobertLucian so I managed to solve this by using the tensorflow-predictor, but reading from s3 directly in the model itself. Now the model is significantly faster, basically equivalent to local tests. So I think the problem is the passing of large datasets via grpc

@RobertLucian
Copy link
Member

RobertLucian commented Jan 12, 2021

@lminer interesting. So there may be something in our codebase or something with TFS itself (grpc-related). We will be investigating this issue. Also, what is the payload size that you've experimented with and observed that it was very slow? Is it 40MB as it was specified in #1774?

@lminer
Copy link
Author

lminer commented Jan 13, 2021

Yeah, it's 40 MB.

@deliahu
Copy link
Member

deliahu commented Jan 14, 2021

I'll go ahead and close this issue, since the gRPC resource exhausted error has been resolved and will be released in 0.27 (next week). #1774 will remain open as we investigate if there is an avoidable issue that is causing the slowdown.

@deliahu deliahu closed this as completed Jan 14, 2021
@deliahu deliahu added this to the v0.27 milestone Jan 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants