Resource exhausted error #1740

lminer · 2020-12-24T23:37:07Z

I'm trying to send audio files, which are fairly large, to the server and am getting a resource exhausted error. Is there any way to configure the server in order to increase the maximum allowed message size?

Here's the stack trace:

2020-12-24 23:30:14.941839:cortex:pid-2247:INFO:500 Internal Server Error POST /
2020-12-24 23:30:14.942071:cortex:pid-2247:ERROR:Exception in ASGI application
Traceback (most recent call last):
  File "/opt/conda/envs/env/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line
390, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/applications.py", line 181, in __call__
    await super().__call__(scope, receive, send)  # pragma: no cover
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/applications.py", line 111, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 187, in parse_payload
    return await call_next(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 134, in register_request
    response = await call_next(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
 File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 566, in __call__
    await route.handle(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 227, in handle
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 41, in app
    response = await func(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/routing.py", line 183, in app
    dependant=dependant, values=values, is_coroutine=is_coroutine
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/routing.py", line 135, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/concurrency.py", line 34, in run_in_threadpool
    return await loop.run_in_executor(None, func, *args)
  File "/opt/conda/envs/env/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 200, in predict
    prediction = predictor_impl.predict(**kwargs)
  File "/mnt/project/serving/cortex_server.py", line 10, in predict
    return self.client.predict({"waveform": np.array(payload["audio"]).astype("float32")})
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/client/tensorflow.py", line
114, in predict
    return self._run_inference(model_input, consts.SINGLE_MODEL_NAME, model_version)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/client/tensorflow.py", line
164, in _run_inference
    return self._client.predict(model_input, model_name, model_version)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/model/tfs.py", line 376, in
predict
    response_proto = self._pred.Predict(prediction_request, timeout=timeout)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/grpc/_channel.py", line 826, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.RESOURCE_EXHAUSTED
        details = "Received message larger than max (102484524 vs. 4194304)"
        debug_error_string = "{"created":"@1608852614.937822193","description":"Received message larger
than max (102484524 vs. 4194304)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":203,"grpc_status":8}"

The text was updated successfully, but these errors were encountered:

RobertLucian · 2020-12-25T01:32:04Z

@lminer this looks like a problem with grpc.

Looking at this, it appears that this limit can be configured. I raised the limit to 128 MB on the 0.25-fix-tfs-lengths branch specifically for you.

To test with the change from the above branch, you only need to modify your API spec (assuming you use the cortex.yaml) to use the quay.io/robertlucian/tensorflow-predictor:0.25.0-tfs image. To do this, set the predictor.image field to quay.io/robertlucian/tensorflow-predictor:0.25.0-tfs and let us know if this worked for you.

For more context, this is the commit we did for the above solution.

Needless to say, this is a temporary solution. If this works for you, then we'll address this in one of the next releases. Please let us know if this worked for you.

lminer · 2020-12-28T02:09:35Z

@RobertLucian thanks for this! Do you have a version with cuda 11.0 and cudnn 8? I'm testing this out for deployment on a GPU machine.

RobertLucian · 2020-12-28T15:53:44Z

@lminer yes, we have a version with CUDA 11.0 and cuDNN 8, but it's for the Python Predictor image.

For the TensorFlow Predictor, there are 2 images required: one for the TFS server (which needs to have access to the GPU device) and one for the API server. As of this moment, the TFS server is configured to use version 10.1 of CUDA and (presumably version 7 of cuDNN). Here's its Dockerfile for the GPU version:

cortex/images/tensorflow-serving-gpu/Dockerfile

Lines 1 to 12 in c8da085

    
           FROM tensorflow/serving:2.3.0-gpu 
        
           RUN apt-get update -qq && apt-get install -y --no-install-recommends -q \ 
        
                   libnvinfer6=6.0.1-1+cuda10.1 \ 
        
                   libnvinfer-plugin6=6.0.1-1+cuda10.1 \ 
        
                   curl \ 
        
               && apt-get clean -qq && rm -rf /var/lib/apt/lists/* 
        
           COPY images/tensorflow-serving-gpu/run.sh /src/ 
        
           RUN chmod +x /src/run.sh 
        
           ENTRYPOINT ["/src/run.sh"]

Let us know if you want to follow this path, in which case, we can help you along the way!

lminer · 2020-12-28T21:58:14Z

Up to now, I've been trying to use the Tensorflow Predictor path. Do you have any TFS and tensorflow predictor images for tensorflow 2.4.0 (CUDA 11.0, CuDNN 8) that work with a higher limit? Here's my cortex.yaml:

- name: foo
  kind: RealtimeAPI
  predictor:
    type: tensorflow
    path: serving/cortex_server.py
    models:
      path: ../foo/
      signature_key: serving_default
    image: quay.io/cortexlabs/python-predictor-gpu-slim:0.25.0-cuda11.0-cudnn8
    tensorflow_serving_image: quay.io/cortexlabs/tensorflow-serving-gpu:0.25.0
  compute:
    gpu: 1

RobertLucian · 2020-12-29T21:50:09Z

@lminer one immediate problem I notice with your cortex.yaml is that for the TensorFlow Predictor, you must have the tensorflow-predictor image in for the image field and the tensorflow-serving-* image for set for the tensorflow_serving_image field.

In your case, it seems to be set to the python-predictor-gpu-slim, which is not the way to configure this. The naming scheme we have come up with for the images is related to the predictor type that the user has specified in the API spec. Therefore, python-predictor-gpu implies that's an image to be used for the Python Predictor type.

Now, let's re-reference this

cortex/images/tensorflow-serving-gpu/Dockerfile

Lines 1 to 12 in c8da085

    
           FROM tensorflow/serving:2.3.0-gpu 
        
           RUN apt-get update -qq && apt-get install -y --no-install-recommends -q \ 
        
                   libnvinfer6=6.0.1-1+cuda10.1 \ 
        
                   libnvinfer-plugin6=6.0.1-1+cuda10.1 \ 
        
                   curl \ 
        
               && apt-get clean -qq && rm -rf /var/lib/apt/lists/* 
        
           COPY images/tensorflow-serving-gpu/run.sh /src/ 
        
           RUN chmod +x /src/run.sh 
        
           ENTRYPOINT ["/src/run.sh"]

This is the Dockerfile for the tensorflow-serving-gpu image, which is automatically set for you in the API spec (when deploying). The current one runs on TF 2.3.0, which from TensorFlow's docs, runs on CUDA 10.1 & cuDNN 7. If we upgrade this to TF 2.4.0, it will give you CUDA 11 and cuDNN 8.

The upgraded Dockerfile would look like this:

FROM tensorflow/serving:2.4.0-gpu

RUN apt-get update -qq && apt-get install -y --no-install-recommends -q \
        libnvinfer7=7.1.3-1+cuda11.0 \
        libnvinfer-plugin7=7.1.3-1+cuda11.0 \
        curl \
    && apt-get clean -qq && rm -rf /var/lib/apt/lists/*

COPY images/tensorflow-serving-gpu/run.sh /src/
RUN chmod +x /src/run.sh

ENTRYPOINT ["/src/run.sh"]

I already built this image for your - you can pull it from quay.io/robertlucian/cortex-tensorflow-serving-gpu-tf2.4:0.25.0. You then need to update the tensorflow_serving_image field in your API spec. In case you want to re-do this by yourself, here's how you can do it:

Clone the cortex repo and checkout the 0.25 branch (with git clone https://github.com/cortexlabs/cortex.git, cd cortex, and git checkout 0.25)
Build the image by running ./build/build-image.sh tensorflow-serving-gpu.
Choose a docker registry (such as Docker Hub or Quay) and create a public image repository on it (for me, I created quay.io/robertlucian/cortex-tensorflow-serving-gpu-tf2.4
Log into your docker registry using docker login with the appropriate credentials
Run docker tag quay.io/cortexlabs/tensorflow-serving-gpu:0.25.0 quay.io/robertlucian/cortex-tensorflow-serving-gpu-tf2.4:0.25.0 (replace the registry URL with yours)
Run docker push quay.io/robertlucian/cortex-tensorflow-serving-gpu-tf2.4:0.25.0 (replace the registry URL with yours)
Update your cortex.yaml to point to your newly pushed image.

Let me know if this worked for you!

lminer · 2020-12-29T23:32:40Z

@RobertLucian Thanks so much for this! What do I use for the image field? I tried the standard tensorflow-predictor image and GPU doesn't work. If I keep the python-predictor-gpu image, I get the same issue as before with the file size being too large.

RobertLucian · 2020-12-30T15:31:56Z

@lminer you would not need to set the image field at all as it will be picking up its adequate image by default. Just to get this clear, the tensorflow-predictor image (the one that goes for the image field) has the API serving stuff whereas the tensorflow-serving-* images (the ones that go for the tensorflow_serving_image field) hold the inference engine.

That means that inferences will take place in the container represented by the tensorflow_serving_image image - and the way to go about it is to use the tensorflow_client (that's passed into the predictor's constructor) to run inferences. Not in any other way. If you try to run inferences inside the actual Tensorflow Predictor implementation using the GPU, that will not work as that is not the inference engine.

Could you also share your TensorFlow Predictor implementation? I might be able to give you a few pointers if required.

lminer · 2020-12-30T22:58:12Z

Ah. I didn't quite understand the breakdown of responsibilities there. I guess the CuDNN not found warning I saw was a red herring as the api doesn't use the GPU anyway. That being said, it seems as if I'm still seeing the error around resource exhaustion. It does look like the GPU is being used before the error, so maybe the exhaustion is occurring when the server is returning large values.

Here's my current cortex.yaml

- name: foo
  kind: RealtimeAPI
  predictor:
    type: tensorflow
    path: serving/cortex_server.py
    models:
      path: ../foo/
      signature_key: serving_default
    tensorflow_serving_image: quay.io/robertlucian/cortex-tensorflow-serving-gpu-tf2.4:0.25.0
  compute:
    gpu: 1

Here's my tensorflow predictor implementation:

import numpy as np


class TensorFlowPredictor:
    def __init__(self, tensorflow_client, config):
        self.client = tensorflow_client
        self.config = config

    def predict(self, payload, query_params, headers):
        target, residual = self.client.predict(
            {"waveform": np.array(payload["audio"]).astype("float32")}
        )
        return {"target": target.numpy().tolist(), "residual": residual.numpy().tolist()}

And here's the error again:

2020-12-30 22:49:04.428932:cortex:pid-1558:INFO:500 Internal Server Error POST /
2020-12-30 22:49:04.429214:cortex:pid-1558:ERROR:Exception in ASGI application
Traceback (most recent call last):
  File "/opt/conda/envs/env/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line
390, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/applications.py", line 181, in __call__
    await super().__call__(scope, receive, send)  # pragma: no cover
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/applications.py", line 111, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 187, in parse_payload
    return await call_next(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 134, in register_request
    response = await call_next(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 566, in __call__
    await route.handle(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 227, in handle
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 41, in app
    response = await func(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/routing.py", line 183, in app
    dependant=dependant, values=values, is_coroutine=is_coroutine
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/routing.py", line 135, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/concurrency.py", line 34, in run_in_threadpool
    return await loop.run_in_executor(None, func, *args)
  File "/opt/conda/envs/env/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 200, in predict
    prediction = predictor_impl.predict(**kwargs)
  File "/mnt/project/serving/cortex_server.py", line 11, in predict
    {"waveform": np.array(payload["audio"]).astype("float32")}
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/client/tensorflow.py", line
114, in predict
    return self._run_inference(model_input, consts.SINGLE_MODEL_NAME, model_version)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/client/tensorflow.py", line
164, in _run_inference
    return self._client.predict(model_input, model_name, model_version)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/model/tfs.py", line 376, in
predict
    response_proto = self._pred.Predict(prediction_request, timeout=timeout)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/grpc/_channel.py", line 826, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.RESOURCE_EXHAUSTED
        details = "Received message larger than max (102484524 vs. 4194304)"
        debug_error_string = "{"created":"@1609368544.425429771","description":"Received message larger
than max (102484524 vs. 4194304)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":203,"grpc_status":8}"
>

lminer · 2021-01-04T18:47:06Z

@RobertLucian Any idea what's going on. Would love to get this working.

RobertLucian · 2021-01-05T13:55:20Z

@lminer your predictor.image also has to be set to quay.io/robertlucian/tensorflow-predictor:0.25.0-tfs, so your cortex.yaml will look like:

- name: foo
  kind: RealtimeAPI
  predictor:
    type: tensorflow
    ...
    image: quay.io/robertlucian/tensorflow-predictor:0.25.0-tfs
    tensorflow_serving_image: quay.io/robertlucian/cortex-tensorflow-serving-gpu-tf2.4:0.25.0
  compute:
    gpu: 1

Remember that the predictor.image is required to fix the original error "Received message larger than max (102484524 vs. 4194304)". predictor.image isn't required if all you want to get is version 2.4.0 of TensorFlow on your TFS.

If you've already done this and it still doesn't work, then maybe you can also share the model with me (you can email me at robert@cortexlabs.com to keep this private) so I can test that as well.

lminer · 2021-01-05T17:37:43Z

I'm still getting the same error message when I add the new predictor.image. I'd rather not share the model if possible. Any other ideas? It seems as if the max message size isn't being raised somewhere.

<_InactiveRpcError of RPC that terminated with:
        status = StatusCode.RESOURCE_EXHAUSTED
        details = "Received message larger than max (102484524 vs. 4194304)"
        debug_error_string = "{"created":"@1609868017.138526542","description":"Received message larger
than max (102484524 vs. 4194304)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":203,"grpc_status":8}"
>
2021-01-05 17:33:37.145639:cortex:pid-1558:INFO:200 OK POST /

RobertLucian · 2021-01-06T15:32:56Z

@lminer I see. We will look into this then. I'll keep you posted as we look into this. Thanks for bringing this up to us.

RobertLucian · 2021-01-07T17:12:15Z

@lminer we have fixed this error in #1769. The fix is present on a customized version of 0.25 specifically made for you - this change is now available in quay.io/robertlucian/tensorflow-predictor:0.25.0-tfs image. Your API spec should have this layout:

- name: foo
  kind: RealtimeAPI
  predictor:
    type: tensorflow
    ...
    image: quay.io/robertlucian/tensorflow-predictor:0.25.0-tfs
    ...

This fix will get into 0.27, or if you need this in 0.26, we can make a patch release for you.

For reference, I had this tested on a sound recognizer model and big inputs/outputs were used (up to 256 MB). Also, the grpc limit has been increased to 256 MB. Do you think this will get higher than this? We may also consider making this limit configurable in the API spec sometime down the road or sooner if it's really required.

lminer · 2021-01-07T21:21:34Z

Thanks, it works! I don't anticipate getting higher than 256 MB. Do you have a tentative timeline for when 0.27 is coming out?

RobertLucian · 2021-01-07T21:35:56Z

@lminer that's great news! We plan to release 0.27 on the 19th of January (about 2 weeks from now).

lminer · 2021-01-10T17:36:45Z

@RobertLucian Have you looked into whether there's any slowdown incurred by passing such a large input to tensorflow-serving? I'm currently trying to figure out why inference is so slow. Right now inference takes 3 minutes and for most of that time both the CPU and GPU are idle. GPU is running for at most 6 seconds during that time. It makes me wonder whether that limit was put there for a reason.

RobertLucian · 2021-01-11T17:15:10Z

@lminer I did not. But I remember trying with 140MB payloads and for something like #1770 it would take a few seconds to run the inference - what I can recall was ~10 seconds. This was done on t3.mediums, which are pretty slow anyway.

Assuming there's something wrong with grpc + TFS, I wonder if this would be in any way related to tensorflow/serving#1725.

I think to rule out a problem with grpc + TFS is to run the model as a PythonPredictor - that is you implement the loading procedure in the predictor's constructor and then you run the prediction in the predict method. I would only go with processes_per_replica set to 1 in this case because of https://docs.cortex.dev/v/0.24/running-on-aws/gpu#tips.

lminer · 2021-01-11T21:43:52Z

@RobertLucian I've been trying to get this working with the python predictor, but I think I'm having an issue where there is a confusion because the session information from when I load the model in the constructor is being lost at predict time. The solution seems to be to convert everything to graph mode. Unfortunately, this isn't so easy given my code. Do you know any other way to get around this error?

RobertLucian · 2021-01-12T00:21:00Z

@lminer can you tell us if you're running multiple threads per process threads_per_process? You should set it to 1 when the TensorFlow framework is used with the PythonPredictor because of the way the framework works. A long time ago, it didn't work at all and so we had a patch that made the TensorFlow framework work using the Python Predictor when thread_per_process is set to 1 - and if I recall correctly, for processes_per_replica too.

If setting the threads_per_process field to 1 doesn't fix it, then maybe downgrading the TensorFlow version might help - assuming the model can be loaded with said version.

lminer · 2021-01-12T00:43:25Z

Everything is set to the default: 1. Do you know what version I would need to downgrade to? I'm currently on 2.4.0

lminer · 2021-01-12T20:07:16Z

@RobertLucian so I managed to solve this by using the tensorflow-predictor, but reading from s3 directly in the model itself. Now the model is significantly faster, basically equivalent to local tests. So I think the problem is the passing of large datasets via grpc

RobertLucian · 2021-01-12T23:34:49Z

@lminer interesting. So there may be something in our codebase or something with TFS itself (grpc-related). We will be investigating this issue. Also, what is the payload size that you've experimented with and observed that it was very slow? Is it 40MB as it was specified in #1774?

lminer · 2021-01-13T00:25:20Z

Yeah, it's 40 MB.

deliahu · 2021-01-14T04:04:29Z

I'll go ahead and close this issue, since the gRPC resource exhausted error has been resolved and will be released in 0.27 (next week). #1774 will remain open as we investigate if there is an avoidable issue that is causing the slowdown.

lminer added the bug Something isn't working label Dec 24, 2020

RobertLucian self-assigned this Jan 6, 2021

This was referenced Jan 6, 2021

Add possibility to export environment variables with .env file #1147

Closed

Remove grpc msg send/receive limit #1769

Merged

RobertLucian mentioned this issue Jan 7, 2021

Add sound recognizer test api #1770

Merged

2 tasks

lminer mentioned this issue Jan 10, 2021

Performance with cortex local server is 6X slower than when model is run from jupyter notebook #1774

Closed

deliahu closed this as completed Jan 14, 2021

deliahu added this to the v0.27 milestone Jan 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource exhausted error #1740

Resource exhausted error #1740

lminer commented Dec 24, 2020

RobertLucian commented Dec 25, 2020 •

edited

lminer commented Dec 28, 2020

RobertLucian commented Dec 28, 2020 •

edited

lminer commented Dec 28, 2020

RobertLucian commented Dec 29, 2020

lminer commented Dec 29, 2020 •

edited

RobertLucian commented Dec 30, 2020

lminer commented Dec 30, 2020

lminer commented Jan 4, 2021

RobertLucian commented Jan 5, 2021 •

edited

lminer commented Jan 5, 2021

RobertLucian commented Jan 6, 2021

RobertLucian commented Jan 7, 2021

lminer commented Jan 7, 2021

RobertLucian commented Jan 7, 2021

lminer commented Jan 10, 2021 •

edited

RobertLucian commented Jan 11, 2021 •

edited

lminer commented Jan 11, 2021

RobertLucian commented Jan 12, 2021

lminer commented Jan 12, 2021

lminer commented Jan 12, 2021

RobertLucian commented Jan 12, 2021 •

edited

lminer commented Jan 13, 2021

deliahu commented Jan 14, 2021

Resource exhausted error #1740

Resource exhausted error #1740

Comments

lminer commented Dec 24, 2020

RobertLucian commented Dec 25, 2020 • edited

lminer commented Dec 28, 2020

RobertLucian commented Dec 28, 2020 • edited

lminer commented Dec 28, 2020

RobertLucian commented Dec 29, 2020

lminer commented Dec 29, 2020 • edited

RobertLucian commented Dec 30, 2020

lminer commented Dec 30, 2020

lminer commented Jan 4, 2021

RobertLucian commented Jan 5, 2021 • edited

lminer commented Jan 5, 2021

RobertLucian commented Jan 6, 2021

RobertLucian commented Jan 7, 2021

lminer commented Jan 7, 2021

RobertLucian commented Jan 7, 2021

lminer commented Jan 10, 2021 • edited

RobertLucian commented Jan 11, 2021 • edited

lminer commented Jan 11, 2021

RobertLucian commented Jan 12, 2021

lminer commented Jan 12, 2021

lminer commented Jan 12, 2021

RobertLucian commented Jan 12, 2021 • edited

lminer commented Jan 13, 2021

deliahu commented Jan 14, 2021

RobertLucian commented Dec 25, 2020 •

edited

RobertLucian commented Dec 28, 2020 •

edited

lminer commented Dec 29, 2020 •

edited

RobertLucian commented Jan 5, 2021 •

edited

lminer commented Jan 10, 2021 •

edited

RobertLucian commented Jan 11, 2021 •

edited

RobertLucian commented Jan 12, 2021 •

edited