Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout errors (with GPT2 and simple 'return' model) #80

Open
m4gr4th34 opened this issue Aug 2, 2020 · 7 comments
Open

Timeout errors (with GPT2 and simple 'return' model) #80

m4gr4th34 opened this issue Aug 2, 2020 · 7 comments

Comments

@m4gr4th34
Copy link

m4gr4th34 commented Aug 2, 2020

Hi, thanks for this very interesting library. I'm trying to use it to handle multiple requests to a gpt-2 chatbot on a server, currently running on a single gpu, in flask app, on apache server, on win10, using tensorflow 1.x for gpu. I am using python 3.7 in a virtual environment. Using ThreadedStreamer or Streamer I only get time-out response. I'm debugging using the smallest gpt2 model, which takes ~5 seconds from launch to reponse, so I'm very confused about where your code is getting hung? To debug, I created a short function that simply returns any input text: this works with ThreadedStreamer, but gives timeout with Streamer. I do not know what else I can try to debug further. (I know I wont get much performance enhancement with gpt2 using this service-streamer on single gpu right now, but I would like it to handle request queues for now, and perhaps when I use multi-gpu in the future).

Thanks in advance for any advice!

sample calls in flask app:

result_5 = interact_model_debug #the gpt2 model predictor function
streamer_5 = ThreadedStreamer(result_5, batch_size=1, max_latency=10) #None
streamer_52 = Streamer(result_5, batch_size=1, max_latency=5, worker_num=1) #None


@app.route("/gentest5", methods=['POST'])
@cross_origin()
def get_gentest5():
    print("PRINT --> starting gentest5 function")
    data = request.get_json()

    if 'text' not in data or len(data['text']) == 0 or 'key' not in data or data['key'] != apikey: #or 'model' not in data:
        abort(400)
    else:
        text = data['text']
        inputs = [text] #request.form.getlists("s")
        print("PRINT --> just before running streamer_5.predict")
        outputs = streamer_52.predict(inputs)
        return jsonify({'result': outputs})

Here is the error log:

[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] [2020-08-01 17:26:09,025] ERROR in app: Exception on /gentest5 [POST]\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] Traceback (most recent call last):\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\flask\app.py", line 2447, in wsgi_app\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] response = self.full_dispatch_request()\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\flask\app.py", line 1952, in full_dispatch_request\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] rv = self.handle_user_exception(e)\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\flask_cors\extension.py", line 161, in wrapped_function\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] return cors_after_request(app.make_response(f(*args, **kwargs)))\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\flask\app.py", line 1821, in handle_user_exception\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] reraise(exc_type, exc_value, tb)\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\flask\_compat.py", line 39, in reraise\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] raise value\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\flask\app.py", line 1950, in full_dispatch_request\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] rv = self.dispatch_request()\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\flask\app.py", line 1936, in dispatch_request\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] return self.view_functionsrule.endpoint\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\flask_cors\decorator.py", line 128, in wrapped_function\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] resp = make_response(f(*args, **kwargs))\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:/Users/irfan/Python_coding_folder/ChatBots/GPT2Local/gpt-2\WebAPI.py", line 218, in get_gentest5\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] outputs = streamer_52.predict(inputs)\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\service_streamer\service_streamer.py", line 132, in predict\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] ret = self._output(task_id)\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\service_streamer\service_streamer.py", line 122, in _output\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] batch_result = future.result(WORKER_TIMEOUT)\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] File "c:\apienv3\lib\site-packages\service_streamer\service_streamer.py", line 41, in result\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] raise TimeoutError("Task: %d Timeout" % self._id)\r
[Sat Aug 01 17:26:09.027854 2020] [wsgi:error] [pid 20964:tid 1432] [client 185.159.158.51:14192] TimeoutError: Task: 0 Timeout\r

@m4gr4th34
Copy link
Author

m4gr4th34 commented Aug 3, 2020

Solved for ThreadedStreamer, but still cannot get Streamer to work, with or without managedmodel.

For ThreadedStreamer, in case it's of use to anyone else, it was an input/output data mismatch. the data that the worker passes to the function is a list [x], therefore you must call the required element input[0] to obtain the input string for your model, otherwise your model will error out. likewise for the output, package the string as a list for the queue to handle. On doing this, the package works beautifully in this mode.

However, I am still having timeout issues with Streamer. Since I cleared up the data handling within flask, I'm very confused as to why threadedstreamer works but streamer does not. I have only a single GPU, but it seems that it should still work (I want to set this up now, so that if I do slide in another GPU it's already ready.

my call for streamer, predict_X is a simple return function of whatever text is passed in:

streamer_3_managed = Streamer(totaltest.predict_X, batch_size=1, max_latency=20, worker_num=1)
@app.route("/gentest3", methods=['POST'])
@cross_origin()
def get_gentest3():
    text = data['text']
    inputs = [text]
    outputs = streamer_3_managed.predict(inputs)[0]
    return jsonify({'result': outputs})

@m4gr4th34
Copy link
Author

m4gr4th34 commented Aug 4, 2020

I got the small GPT-2 model working smoothly with 4 workers on Streamer on single GPU, however the x-large model with 1 worker gives an OOM error (even though it works well with ThreadedStreamer?)

To work with Windows, I followed the same outlay as :

from gevent import monkey; monkey.patch_all()

This made the 'freeze_support' errors disappear in win10. My GPU is NVidea GeForce GTX 1080, with 6+GB of RAM, so technically it should support x-large GPT-2 model with 1 worker on Streamer. Any ideas what might be still missing from my setup? Thx in advance for any tips and tricks. :) I am very pleased with the progress so far, and this excellent code you've provided.

To be more specific, it looks like the model is already loaded, and I'm trying to reload it, even though it is just 1 worker. Let me try rebooting my system, in case there is some ghost model loaded somehow in the cache. -- Nope, still doesn't load up. It almost looks like it is double loading the GPU on running.

2020-08-04 04:40:20.599118: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
2020-08-04 04:40:20.608089: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-08-04 04:40:20.614324: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-08-04 04:40:20.620525: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-08-04 04:40:20.627049: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-08-04 04:40:20.632703: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-08-04 04:40:20.638900: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-08-04 04:40:20.645803: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-04 04:40:20.651790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-08-04 04:40:20.656754: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
2020-08-04 04:40:20.664897: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-08-04 04:40:20.670425: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-08-04 04:40:20.677152: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-08-04 04:40:20.683039: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-08-04 04:40:20.688625: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-08-04 04:40:20.695036: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-08-04 04:40:20.701342: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-04 04:40:20.707072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-08-04 04:40:20.711966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-04 04:40:20.718051: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-08-04 04:40:20.721500: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-08-04 04:40:20.725219: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6354 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
.
.
.
.
library cublas64_100.dll
2020-08-04 04:41:11.683554: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 83.20MiB (rounded to 87244800). Current allocation summary follows.
2020-08-04 04:41:11.691644: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (256): Total Chunks: 18, Chunks in use: 18. 4.5KiB allocated for chunks. 4.5KiB in use in bin. 66B client-requested in use in bin.
2020-08-04 04:41:11.700873: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-08-04 04:41:11.709753: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1024): Total Chunks: 2, Chunks in use: 1. 3.0KiB allocated for chunks. 1.3KiB in use in bin. 1.0KiB client-requested in use in bin.
2020-08-04 04:41:11.719103: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-08-04 04:41:11.727557: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4096): Total Chunks: 291, Chunks in use: 291. 1.78MiB allocated for chunks. 1.78MiB in use in bin. 1.78MiB client-requested in use in bin.
2020-08-04 04:41:11.736731: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8192): Total Chunks: 1, Chunks in use: 1. 14.0KiB allocated for chunks. 14.0KiB in use in bin. 13.9KiB client-requested in use in bin.
2020-08-04 04:41:11.745413: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16384): Total Chunks: 96, Chunks in use: 95. 2.04MiB allocated for chunks. 2.03MiB in use in bin. 2.03MiB client-requested in use in bin.
2020-08-04 04:41:11.754319: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (32768): Total Chunks: 1, Chunks in use: 1. 47.8KiB allocated for chunks. 47.8KiB in use in bin. 25.0KiB client-requested in use in bin.
2020-08-04 04:41:11.763161: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-08-04 04:41:11.772188: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (131072): Total Chunks: 3, Chunks in use: 3. 589.5KiB allocated for chunks. 589.5KiB in use in bin. 588.9KiB client-requested in use in bin.
2020-08-04 04:41:11.781437: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (262144): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-08-04 04:41:11.789880: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (524288): Total Chunks: 2, Chunks in use: 2. 1.74MiB allocated for chunks. 1.74MiB in use in bin. 1.45MiB client-requested in use in bin.
2020-08-04 04:41:11.798773: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1048576): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-08-04 04:41:11.807629: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-08-04 04:41:11.816629: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4194304): Total Chunks: 1, Chunks in use: 1. 6.25MiB allocated for chunks. 6.25MiB in use in bin. 6.25MiB client-requested in use in bin.
2020-08-04 04:41:11.825722: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8388608): Total Chunks: 50, Chunks in use: 48. 498.97MiB allocated for chunks. 468.75MiB in use in bin. 468.75MiB client-requested in use in bin.
2020-08-04 04:41:11.837309: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16777216): Total Chunks: 48, Chunks in use: 48. 1.37GiB allocated for chunks. 1.37GiB in use in bin. 1.37GiB client-requested in use in bin.
2020-08-04 04:41:11.846960: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (33554432): Total Chunks: 95, Chunks in use: 94. 3.69GiB allocated for chunks. 3.65GiB in use in bin. 3.58GiB client-requested in use in bin.
2020-08-04 04:41:11.856235: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (67108864): Total Chunks: 3, Chunks in use: 2. 210.96MiB allocated for chunks. 140.39MiB in use in bin. 78.13MiB client-requested in use in bin.
2020-08-04 04:41:11.865589: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (134217728): Total Chunks: 1, Chunks in use: 1. 142.22MiB allocated for chunks. 142.22MiB in use in bin. 82.62MiB client-requested in use in bin.
2020-08-04 04:41:11.875085: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (268435456): Total Chunks: 1, Chunks in use: 1. 306.74MiB allocated for chunks. 306.74MiB in use in bin. 306.74MiB client-requested in use in bin.
2020-08-04 04:41:11.884757: I tensorflow/core/common_runtime/bfc_allocator.cc:885] Bin for 83.20MiB was 64.00MiB, Chunk State:
2020-08-04 04:41:11.890021: I tensorflow/core/common_runtime/bfc_allocator.cc:891] Size: 70.56MiB | Requested Size: 12.5KiB | in_use: 0 | bin_num: 18, prev: Size: 6.3KiB | Requested Size: 6.3KiB | in_use: 1 | bin_num: -1
2020-08-04 04:41:11.899264: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 1048576
2020-08-04 04:41:11.904561: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000B06400000 next 1 of size 1280

.
.
.
.
2020-08-04 04:41:14.062987: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000BD52CDA00 next 350 of size 6400
2020-08-04 04:41:14.068871: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000BD52CF300 next 351 of size 40960000
2020-08-04 04:41:14.075052: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000BD79DF300 next 352 of size 6400
2020-08-04 04:41:14.080802: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000BD79E0C00 next 353 of size 10240000
2020-08-04 04:41:14.086587: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000BD83A4C00 next 354 of size 6400
2020-08-04 04

@m4gr4th34
Copy link
Author

m4gr4th34 commented Aug 4, 2020

Solved for single worker, single GPU, but now cannot get it to run streamer on apache/mod_wsgi: the name=main portion doesn't execute when imported from wsgi file.

For naked flask: I had added the following below 'import tensorflow as tf' in order to get the multi-worker to work for the small model. After commenting out, the x-large model works with single worker. So, hopefully if I add more GPU on my system it should only require a simple tweak in my code now. I commented these out:

#config = tf.ConfigProto()
#config.gpu_options.allow_growth = True
#session = tf.Session(config=config)

And now it works again for single worker. I will leave this here in case it is of use to anyone else using your code for sequential generative models. Thx for the excellent code.

Further tips for others: place 'import tensorflow as tf' inside the definition of your NN model, to ensure each worker has a clean clearly-assigned canvas so-to-speak. This will mean a long load time for your first predictor call, while tf loads your model, so if using generator to speed up shot-to-shot predictions, set a long timeout (~100) to make sure the first prediction call doesn't timeout your worker.

@m4gr4th34
Copy link
Author

m4gr4th34 commented Aug 10, 2020

In case this is of use to anyone else: multiprocessing in python through apache-mod_wsgi on windows seems impossible to achieve. Therefore I created a dual boot in Debian, to explore API capabilities using nginx-uwsgi with service streamer. As others noted, it doesn't seem to work with 'spawn', however changing 'spawn' to 'root' in the two call instances of service-streamer allows you to run multi-processing on a production server in linux. This could theoretically also work in WSL2, if one wishes to stay in windows, however at present the WSL2 GPU cuda support is highly inefficient, creating a x5-10 slowdown in running (I also personally verified this on my code). I have not yet tried redis-streamers, but I'm confident that should work without issue (as and when required), and it's nice to know that i could have multi-gpu multiprocessing support if i do go that route.

Note: Just make sure 'master = false' in the [uwsgi] *.ini file, otherwise predictions will hang and your workers will timeout. Also, waking after suspend causes predictions to hang, so it appears the GPU needs to always be online, i.e. cannot configure 'wake-on' calls at the moment.

@Meteorix
Copy link
Contributor

@m4gr4th34 thanks for your interest. The main difference between Streamer and ThreadedStreamer is that Streamer uses multi-process. So it will occupy double gpu memory when there are 2 workers.

Another tips: apache-mod_wsgi is outdated for python server. If you must use Windows(which is not recommended) to deploy your server, just use gevent.wsgi

@m4gr4th34
Copy link
Author

Thanks, that explains why apache-mod_wsgi is so poorly documented. I am working in linux now, I like the new debian features. well, it seems most things in CS are poorly documented, tbh. A question about multiprocessing: my GPT2 model saturates my single GPU, if I install a second GPU is it possible to use streamer to use gpu1 for worker1 and gpu2 for worker2?

@Meteorix
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants