Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--googlelifesciences segmentation fault #1444

Closed
cademirch opened this issue Feb 28, 2022 · 52 comments
Closed

--googlelifesciences segmentation fault #1444

cademirch opened this issue Feb 28, 2022 · 52 comments
Assignees
Labels
bug Something isn't working

Comments

@cademirch
Copy link
Contributor

cademirch commented Feb 28, 2022

Snakemake version

Tested on 7.0.0, 6.15.5 and 6.15.0
Describe the bug

Segmentation fault (core dumped) when executing with --google-lifesciences.
Logs

Minimal example

Snakefile:

rule all:
    input: expand("done{i}.txt", i=range(100) )

rule test:
    output: "done{i}.txt"
    shell: "echo hi > {output}"

Command line:
snakemake --google-lifesciences --default-remote-prefix snake-test -j10
Additional context

Tried -j10 and -j100, to no effect, still same error.

@cademirch cademirch added the bug Something isn't working label Feb 28, 2022
@CowanCS1
Copy link
Contributor

CowanCS1 commented Mar 1, 2022

I've been seeing this issue on my own workflows, and confirm I can reproduce it from this example.

@johanneskoester
Copy link
Contributor

@moschetti can you please have a look at this? Seems like an error in one of the google life science API packages, because Snakemake itself is pure Python and cannot really generate segfaults.

@johanneskoester
Copy link
Contributor

@cademirch if possible, to speed up things on the google side, it would be nice if you could narrow it down to a particular API call. The life science backend of Snakemake is here. When you add some print statements there, you probably can discover in which line the segfault happens.

@johanneskoester
Copy link
Contributor

@vsoch in case you are interested in this as well, any help would be appreciated, but it is also fine if you leave this to us and Google ;-).

@johanneskoester johanneskoester self-assigned this Mar 1, 2022
@vsoch
Copy link
Contributor

vsoch commented Mar 1, 2022

I could probably help over a weekend if I can still access the snakemake-testing GCP project?

@johanneskoester
Copy link
Contributor

I could probably help over a weekend if I can still access the snakemake-testing GCP project?

I think that still should work.

@johanneskoester
Copy link
Contributor

Thanks a lot, that is really appreciated!

@CowanCS1
Copy link
Contributor

CowanCS1 commented Mar 1, 2022

Following the suggestion of @johanneskoester I placed print commands in most of the methods of the GoogleLifeSciencesExecutor class to write out the method being called.

Here's the segmentation fault:

generate_job_action
_generate_environment
_generate_pipeline_labels
_get_jobname
_generate_log_action
_generate_pipeline_labels
_get_jobname
_generate_environment
_retry_request
Segmentation fault (core dumped)

I was also seeing other errors at the same position, such as

double free or corruption (out)
Aborted (core dumped)
double free or corruption (!prev)
Aborted (core dumped)

and

free(): corrupted unsorted chunks
Aborted (core dumped)

To make sure the error was actually in _retry_request I put a print("Finished _retry_request") immediately after capturing the result of request.execute() which resulted in this:

generate_job_action
_generate_environment
_generate_pipeline_labels
_get_jobname
_generate_log_action
_generate_pipeline_labels
_get_jobname
_generate_environment
Started _retry_request
corrupted size vs. prev_size
Aborted (core dumped)

So I think that narrows it down to the request.execute() command in _request_retry.

In addition, I was seeing the occasional MissingOutputException which seemed to be legitimate: files would sometimes not appear in the bucket even with a long delay. Here is the output running with the -k option:

Select jobs to execute...
_retry_request
_job_was_successful
_retry_request
Waiting at most 5 seconds for missing files.
_job_was_successful
MissingOutputException in line 4 of /home/cameron.cowan/data/snakemake_test/Snakefile:
Job Missing files after 5 seconds:
cowan_workflow/snakemake_test/done95.txt
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Job id: 96 completed successfully, but some output files are missing. 96
Job failed, going on with independent jobs.
_retry_request
run

This issue seemed to be more frequent when using a multi-region bucket, compared to a single region, but I was able to reproduce it in both. This is definitely a minor inconvenience compared to the original post, but if you start testing this in earnest you'll probably hit it.

I should note that I was twice able to run all 100 jobs without an error. They were back to back executions, and qualitatively appeared to have relatively fewer calls to _retry_request compared to the other runs. So reproducing this issue may be somewhat dependent on the underlying GCP state in the region being used (europe-west1 for my tests).

Hope that helps!

@cademirch
Copy link
Contributor Author

cademirch commented Mar 1, 2022

Adding to @CowanCS1's findings. I ran the test with faulthandler enabled and got the following traceback:

Fatal Python error: Segmentation fault

Current thread 0x00007fbbacfa2700 (most recent call first):
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/http/client.py", line 237 in parse_headers
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/http/client.py", line 339 in begin
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/http/client.py", line 1377 in getresponse
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/httplib2/__init__.py", line 1373 in _conn_request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/httplib2/__init__.py", line 1421 in _request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/httplib2/__init__.py", line 1701 in request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/oauth2client/transport.py", line 280 in request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/oauth2client/transport.py", line 173 in new_request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/googleapiclient/http.py", line 190 in _retry_request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/googleapiclient/http.py", line 922 in execute
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/googleapiclient/_helpers.py", line 131 in positional_wrapper
  File "/home/ubuntu/snakemake/snakemake/executors/google_lifesciences.py", line 903 in _retry_request
  File "/home/ubuntu/snakemake/snakemake/executors/google_lifesciences.py", line 951 in _wait_for_jobs
  File "/home/ubuntu/snakemake/snakemake/executors/__init__.py", line 756 in _wait_thread
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/threading.py", line 910 in run
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007fbbb2873740 (most recent call first):
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/ssl.py", line 1099 in read
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/ssl.py", line 1241 in recv_into
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/socket.py", line 704 in readinto
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/http/client.py", line 281 in _read_status
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/http/client.py", line 320 in begin
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/http/client.py", line 1377 in getresponse
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/httplib2/__init__.py", line 1373 in _conn_request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/httplib2/__init__.py", line 1421 in _request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/httplib2/__init__.py", line 1701 in request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/oauth2client/transport.py", line 280 in request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/oauth2client/transport.py", line 173 in new_request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/googleapiclient/http.py", line 190 in _retry_request
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/googleapiclient/http.py", line 922 in execute
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/lib/python3.9/site-packages/googleapiclient/_helpers.py", line 131 in positional_wrapper
  File "/home/ubuntu/snakemake/snakemake/executors/google_lifesciences.py", line 903 in _retry_request
  File "/home/ubuntu/snakemake/snakemake/executors/google_lifesciences.py", line 832 in run
  File "/home/ubuntu/snakemake/snakemake/executors/__init__.py", line 153 in run_jobs
  File "/home/ubuntu/snakemake/snakemake/scheduler.py", line 588 in run
  File "/home/ubuntu/snakemake/snakemake/scheduler.py", line 540 in schedule
  File "/home/ubuntu/snakemake/snakemake/workflow.py", line 1097 in execute
  File "/home/ubuntu/snakemake/snakemake/__init__.py", line 714 in snakemake
  File "/home/ubuntu/snakemake/snakemake/__init__.py", line 2798 in main
  File "/home/ubuntu/miniconda3/envs/snakemake-testing/bin/snakemake", line 33 in <module>

@vsoch
Copy link
Contributor

vsoch commented Mar 1, 2022

I may not need to help if you have the full traceback! Here is where that seems to be in the client code: https://github.com/googleapis/google-api-python-client/blob/8e9750ee4fde0a531d1b7085d660fbe04d8f1e4e/googleapiclient/_helpers.py#L111-L131. I'm seeing it happens right after ssl.py so it might be related to that?

@CowanCS1 are you able to change the entrypoint to run the same thing with gdb so we can maybe see more of what is going on? E.g., https://wiki.python.org/moin/DebuggingWithGdb.I'm guessing you are already building the container yourself so you'd want to make sure it's installed.

@cademirch
Copy link
Contributor Author

cademirch commented Mar 2, 2022

Could #983 and/or #1006 be related to this? Seems like a _retry_request is a common problem spot in all 3.

@vsoch
Copy link
Contributor

vsoch commented Mar 2, 2022

I think we can’t rule them out yet - are you able to run with gdb to get more detail?

@cademirch
Copy link
Contributor Author

@vsoch I gave it a shot earlier today by trying to get gdb to attach the the running snakemake process, but had no success unfortunately. I'll look into it more in the morning, though.

@CowanCS1
Copy link
Contributor

CowanCS1 commented Mar 2, 2022

@vsoch @cademirch I launched snakemake and attached gdb to the process using gdb python PID. you may need to modify the ptrace_scope as in the second answer https://stackoverflow.com/questions/19215177/how-to-solve-ptrace-operation-not-permitted-when-trying-to-attach-gdb-to-a-pro. I didn't install gdb into the snakemake container since the error seemed to be on the local side and it seems like it went okay, but let me know if I missed something and I can do so.

gdb interactive output, attaching:

Reading symbols from python...
Attaching to program: /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/bin/python, process 8315
[New LWP 8318]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f16facfbaff in poll () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) c
Continuing.

SegFault error

Thread 2 "snakemake" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f16f5ae4700 (LWP 8318)]
0x00007f16fac81104 in ?? () from /lib/x86_64-linux-gnu/libc.so.6

py-bt

(gdb) py-bt
Traceback (most recent call first):
  File "/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/http/client.py", line 235, in parse_headers
    hstring = b''.join(headers).decode('iso-8859-1')
  File "/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/http/client.py", line 337, in begin
    self.headers = self.msg = parse_headers(self.fp)
  File "/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/http/client.py", line 1374, in getresponse
    response.begin()
  File "/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/httplib2/__init__.py", line 1373, in _conn_request
    response = conn.getresponse()
  File "/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/httplib2/__init__.py", line 1421, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/httplib2/__init__.py", line 1701, in request
    (response, content) = self._request(
  File "/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/oauth2client/transport.py", line 280, in request
    return http_callable(uri, method=method, body=body, headers=headers,
  File "/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/oauth2client/transport.py", line 173, in new_request
    resp, content = request(orig_request_method, uri, method, body,
  File "/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/googleapiclient/http.py", line 190, in _retry_request
    resp, content = http.request(uri, method, *args, **kwargs)
  File "/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/googleapiclient/http.py", line 922, in execute
    resp, content = _retry_request(
  File "/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/googleapiclient/_helpers.py", line 131, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/snakemake-7.0.1+0.gccddf985.dirty-py3.10.egg/snakemake/executors/google_lifesciences.py", line 928, in _retry_request
    result = request.execute()
  File "/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/snakemake-7.0.1+0.gccddf985.dirty-py3.10.egg/snakemake/executors/google_lifesciences.py", line 979, in _wait_for_jobs
    status = self._retry_request(request)
  File "/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/snakemake-7.0.1+0.gccddf985.dirty-py3.10.egg/snakemake/executors/__init__.py", line 756, in _wait_thread
    self._wait_for_jobs()
  File "/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/threading.py", line 946, in run
    self._target(*self._args, **self._kwargs)
  File "/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
    self.run()
  File "/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/threading.py", line 966, in _bootstrap
    self._bootstrap_inner()

bt

(gdb) bt
#0  0x00007f16fac81104 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f16fac83419 in malloc () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x000055e01475492e in PyMem_RawMalloc (size=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/obmalloc.c:1949
#3  _PyObject_Malloc (ctx=<optimized out>, nbytes=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/obmalloc.c:1956
#4  _PyObject_Malloc (nbytes=<optimized out>, ctx=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/obmalloc.c:1949
#5  PyMem_Malloc (size=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/obmalloc.c:605
#6  0x000055e0147cd40b in stringlib_bytes_join (sep=b'', iterable=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/stringlib/join.h:46
#7  0x000055e0147c6804 in method_vectorcall_O (func=<method_descriptor at remote 0x7f16fa7f1170>, args=0x7f16f6de7b28, 
    nargsf=<optimized out>, kwnames=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/descrobject.c:460
#8  0x000055e01480dd2b in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f6de7b28, 
    callable=<method_descriptor at remote 0x7f16fa7f1170>, tstate=0x55e016b111c0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:114
#9  PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f6de7b28, 
    callable=<method_descriptor at remote 0x7f16fa7f1170>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:123
#10 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7f16f5ae1410, 
    tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5867
#11 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:4198
#12 0x000055e0147f013d in _PyEval_EvalFrame (throwflag=0, 
    f=Frame 0x7f16f6de79a0, for file /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/http/client.py, line 235, in parse_headers (fp=<_io.BufferedReader at remote 0x7f16f6150720>, _class=<type at remote 0x55e015937350>, headers=[b'Content-Type: application/json; charset=UTF-8\r\n', b'Vary: Origin\r\n', b'Vary: X-Origin\r\n', b'Vary: Referer\r\n', b'Content-Encoding: gzip\r\n', b'Date: Wed, 02 Mar 2022 08:21:37 GMT\r\n', b'Server: ESF\r\n', b'Cache-Control: private\r\n', b'X-XSS-Protection: 0\r\n', b'X-Frame-Options: SAMEORIGIN\r\n', b'X-Content-Type-Options: nosniff\r\n', b'Transfer-Encoding: chunked\r\n', b'\r\n']), tstate=0x55e016b111c0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/internal/pycore_ceval.h:46
#13 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, 
    con=<optimized out>, tstate=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5065
#14 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:342
#15 0x000055e01480da24 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f61227f0, 
    callable=<function at remote 0x7f16fa0e6560>, tstate=0x55e016b111c0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:114
#16 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f61227f0, 
    callable=<function at remote 0x7f16fa0e6560>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:123
#17 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7f16f5ae1660, 
    tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5867
#18 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:4213
#19 0x000055e0147f013d in _PyEval_EvalFrame (throwflag=0, 
    f=Frame 0x7f16f6122640, for file /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/http/client.py, line 337, in begin (self=<HTTPResponse(fp=<_io.BufferedReader at remote 0x7f16f6150720>, debuglevel=0, _method='POST', headers=None, msg=None, version=11, status=200, reason='OK', chunked='UNKNOWN', chunk_left='UNKNOWN', length='UNKNOWN', will_close='UNKNOWN', code=200) at remote 0x7f16f6662fe0>, version='HTTP/1.1', status=200, reason='OK\r\n'), tstate=0x55e016b111c0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/internal/pycore_ceval.h:46
#20 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, 
    con=<optimized out>, tstate=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5065
#21 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:342
#22 0x000055e01480dd2b in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f6dd7d88, 
    callable=<function at remote 0x7f16fa0e67a0>, tstate=0x55e016b111c0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:114
#23 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f6dd7d88, 
    callable=<function at remote 0x7f16fa0e67a0>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:123
#24 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7f16f5ae18b0, 
    tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5867
#25 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:4198
#26 0x000055e0147f013d in _PyEval_EvalFrame (throwflag=0, 
    f=Frame 0x7f16f6dd7c10, for file /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/http/client.py, line 1374, in getresponse (self=<HTTPSConnectionWithTimeout(disable_ssl_certificate_validation=False, ca_certs='/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/certifi/cacert.pem', proxy_info=None, timeout=60, source_address=None, blocksize=8192, sock=<SSLSocket(_context=<SSLContext at remote 0x7f16f5bcb640>, _session=None, _sslobj=<_ssl._SSLSocket at remote 0x7f16f6110d60>, server_side=False, server_hostname='lifesciences.googleapis.com', do_handshake_on_connect=True, suppress_ragged_eofs=True, _connected=True) at remote 0x7f16f6e1c520>, _buffer=[], _HTTPConnection__response=None, _HTTPConnection__state='Request-sent', _method='POST', _tunnel_host=None, _tunnel_port=None, _tunnel_headers={}, host='lifesciences.googleapis.com', port=443, _create_connection=<function at remote 0x7f16fa528b80>, key_file=None, cert_file=None, _context=<...>, key_password...(truncated), tstate=0x55e016b111c0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/internal/pycore_ceval.h:46
#27 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, 
    con=<optimized out>, tstate=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5065
#28 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:342
#29 0x000055e01480dd2b in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f00d6278, 
    callable=<function at remote 0x7f16fa114430>, tstate=0x55e016b111c0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:114
#30 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f00d6278, 
    callable=<function at remote 0x7f16fa114430>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:123
#31 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7f16f5ae1b00, 
    tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5867
#32 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:4198
#33 0x000055e0147f013d in _PyEval_EvalFrame (throwflag=0, 
    f=Frame 0x7f16f00d60b0, for file /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/httplib2/__init__.py, line 1373, in _conn_request (self=<Http(proxy_info=<function at remote 0x7f16f69201f0>, ca_certs=None, disable_ssl_certificate_validation=False, tls_maximum_version=None, tls_minimum_version=None, connections={'https:oauth2.googleapis.com': <HTTPSConnectionWithTimeout(disable_ssl_certificate_validation=False, ca_certs='/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/certifi/cacert.pem', proxy_info=None, timeout=60, source_address=None, blocksize=8192, sock=<SSLSocket(_context=<SSLContext at remote 0x7f16f5bc8340>, _session=None, _sslobj=<_ssl._SSLSocket at remote--Type <RET> for more, q to quit, c to continue without paging--c
 0x7f16f5b7e340>, server_side=False, server_hostname='oauth2.googleapis.com', do_handshake_on_connect=True, suppress_ragged_eofs=True, _connected=True) at remote 0x7f16f5b861a0>, _buffer=[], _HTTPConnection__response=<HTTPResponse(fp=None, debuglevel=0, _method='POST', head...(truncated), tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/internal/pycore_ceval.h:46
#34 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, con=<optimized out>, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5065
#35 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:342
#36 0x000055e01480dd2b in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f009f9c0, callable=<function at remote 0x7f16f69220e0>, tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:114
#37 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f009f9c0, callable=<function at remote 0x7f16f69220e0>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:123
#38 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7f16f5ae1d50, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5867
#39 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:4198
#40 0x000055e0147f08e5 in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7f16f009f790, for file /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/httplib2/__init__.py, line 1421, in _request (self=<Http(proxy_info=<function at remote 0x7f16f69201f0>, ca_certs=None, disable_ssl_certificate_validation=False, tls_maximum_version=None, tls_minimum_version=None, connections={'https:oauth2.googleapis.com': <HTTPSConnectionWithTimeout(disable_ssl_certificate_validation=False, ca_certs='/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/certifi/cacert.pem', proxy_info=None, timeout=60, source_address=None, blocksize=8192, sock=<SSLSocket(_context=<SSLContext at remote 0x7f16f5bc8340>, _session=None, _sslobj=<_ssl._SSLSocket at remote 0x7f16f5b7e340>, server_side=False, server_hostname='oauth2.googleapis.com', do_handshake_on_connect=True, suppress_ragged_eofs=True, _connected=True) at remote 0x7f16f5b861a0>, _buffer=[], _HTTPConnection__response=<HTTPResponse(fp=None, debuglevel=0, _method='POST', headers=<...(truncated), tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/internal/pycore_ceval.h:46
#41 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, con=<optimized out>, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5065
#42 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:342
#43 0x000055e01480dd2b in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f0024510, callable=<function at remote 0x7f16f6922170>, tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:114
#44 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f0024510, callable=<function at remote 0x7f16f6922170>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:123
#45 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7f16f5ae1fa0, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5867
#46 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:4198
#47 0x000055e0147f1335 in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7f16f00242a0, for file /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/httplib2/__init__.py, line 1701, in request (self=<Http(proxy_info=<function at remote 0x7f16f69201f0>, ca_certs=None, disable_ssl_certificate_validation=False, tls_maximum_version=None, tls_minimum_version=None, connections={'https:oauth2.googleapis.com': <HTTPSConnectionWithTimeout(disable_ssl_certificate_validation=False, ca_certs='/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/certifi/cacert.pem', proxy_info=None, timeout=60, source_address=None, blocksize=8192, sock=<SSLSocket(_context=<SSLContext at remote 0x7f16f5bc8340>, _session=None, _sslobj=<_ssl._SSLSocket at remote 0x7f16f5b7e340>, server_side=False, server_hostname='oauth2.googleapis.com', do_handshake_on_connect=True, suppress_ragged_eofs=True, _connected=True) at remote 0x7f16f5b861a0>, _buffer=[], _HTTPConnection__response=<HTTPResponse(fp=None, debuglevel=0, _method='POST', headers=<H...(truncated), tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/internal/pycore_ceval.h:46
#48 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=0x7f16f61223e0, locals=0x0, con=<optimized out>, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5065
#49 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=0x7f16f61223e0, func=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:342
#50 _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=0x7f16f61223e0, callable=<optimized out>, tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:114
#51 method_vectorcall (method=<optimized out>, args=0x7f16f61223e8, nargsf=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/classobject.c:53
#52 0x000055e01480e977 in _PyObject_VectorcallTstate (kwnames=('method', 'body', 'headers', 'redirections', 'connection_type'), nargsf=<optimized out>, args=<optimized out>, callable=<method at remote 0x7f16f5bc02c0>, tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:114
#53 PyObject_Vectorcall (kwnames=('method', 'body', 'headers', 'redirections', 'connection_type'), nargsf=<optimized out>, args=<optimized out>, callable=<method at remote 0x7f16f5bc02c0>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:123
#54 call_function (kwnames=('method', 'body', 'headers', 'redirections', 'connection_type'), oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7f16f5ae2240, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5867
#55 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:4231
#56 0x000055e0147f013d in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7f16f6122240, for file /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/oauth2client/transport.py, line 280, in request (http=<method at remote 0x7f16f5bc02c0>, uri='https://lifesciences.googleapis.com/v2beta/projects/273727082746/locations/europe-west2/operations/266139020635888359?alt=json', method='GET', body=None, headers={b'accept': b'application/json', b'accept-encoding': b'gzip, deflate', b'user-agent': b'(gzip)', b'x-goog-api-client': b'gdcl/2.38.0 gl-python/3.10.2', b'content-length': b'0', b'Authorization': b'Bearer ya29.c.b0AXv0zTNXW9WRdOqeGFYLkXo9hMf0jsGvgXF2uM3cUCpiGIO3Iae1tiagdzhi6WDBIg8nwq-tTEBVoLTN1wiShPm8Je7IUriMRkGHw5kGgUuxmDaws6oepazKrS1xnNABzP19GfDddD1jRskLGdyV9h7cDTDRdDSZ15v8zY5vpEuuwBlrnrYM1F_meDrIFAD3aod50UNPsfcjBba1UIQk_HRc7B2-2wY........................................................................................................................................................................................................................(truncated), tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/internal/pycore_ceval.h:46
#57 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, con=<optimized out>, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5065
#58 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:342
#59 0x000055e01480da24 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f00233e8, callable=<function at remote 0x7f16f6931f30>, tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:114
#60 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f00233e8, callable=<function at remote 0x7f16f6931f30>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:123
#61 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7f16f5ae2490, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5867
#62 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:4213
#63 0x000055e0147f08e5 in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7f16f0023210, for file /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/oauth2client/transport.py, line 173, in new_request (uri='https://lifesciences.googleapis.com/v2beta/projects/273727082746/locations/europe-west2/operations/266139020635888359?alt=json', method='GET', headers={'accept': 'application/json', 'accept-encoding': 'gzip, deflate', 'user-agent': '(gzip)', 'x-goog-api-client': 'gdcl/2.38.0 gl-python/3.10.2', 'content-length': '0', 'Authorization': 'Bearer ya29.c.b0AXv0zTNXW9WRdOqeGFYLkXo9hMf0jsGvgXF2uM3cUCpiGIO3Iae1tiagdzhi6WDBIg8nwq-tTEBVoLTN1wiShPm8Je7IUriMRkGHw5kGgUuxmDaws6oepazKrS1xnNABzP19GfDddD1jRskLGdyV9h7cDTDRdDSZ15v8zY5vpEuuwBlrnrYM1F_meDrIFAD3aod50UNPsfcjBba1UIQk_HRc7B2-2wY...................................................................................................................................................................................................................................................................................(truncated), tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/internal/pycore_ceval.h:46
#64 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=0x7f16f69598f0, locals=0x0, con=0x7f16f0023220, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5065
#65 _PyFunction_Vectorcall (func=func@entry=<function at remote 0x7f16f5b8de10>, stack=stack@entry=0x7f16f6129d48, nargsf=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:342
#66 0x000055e014770db8 in PyVectorcall_Call (kwargs=<optimized out>, tuple=<optimized out>, callable=<function at remote 0x7f16f5b8de10>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:267
#67 _PyObject_Call (kwargs=<optimized out>, args=<optimized out>, callable=<function at remote 0x7f16f5b8de10>, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:290
#68 PyObject_Call (callable=<function at remote 0x7f16f5b8de10>, args=<optimized out>, kwargs=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:317
#69 0x000055e0148103ce in do_call_core (kwdict={'body': None, 'headers': {'accept': 'application/json', 'accept-encoding': 'gzip, deflate', 'user-agent': '(gzip)', 'x-goog-api-client': 'gdcl/2.38.0 gl-python/3.10.2', 'content-length': '0'}}, callargs=('https://lifesciences.googleapis.com/v2beta/projects/273727082746/locations/europe-west2/operations/266139020635888359?alt=json', 'GET'), func=<function at remote 0x7f16f5b8de10>, trace_info=0x7f16f5ae2740, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5919
#70 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:4277
#71 0x000055e0147f013d in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7f16f00279e0, for file /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/googleapiclient/http.py, line 190, in _retry_request (http=<Http(proxy_info=<function at remote 0x7f16f69201f0>, ca_certs=None, disable_ssl_certificate_validation=False, tls_maximum_version=None, tls_minimum_version=None, connections={'https:oauth2.googleapis.com': <HTTPSConnectionWithTimeout(disable_ssl_certificate_validation=False, ca_certs='/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/certifi/cacert.pem', proxy_info=None, timeout=60, source_address=None, blocksize=8192, sock=<SSLSocket(_context=<SSLContext at remote 0x7f16f5bc8340>, _session=None, _sslobj=<_ssl._SSLSocket at remote 0x7f16f5b7e340>, server_side=False, server_hostname='oauth2.googleapis.com', do_handshake_on_connect=True, suppress_ragged_eofs=True, _connected=True) at remote 0x7f16f5b861a0>, _buffer=[], _HTTPConnection__response=<HTTPResponse(fp=None, debuglevel=0, _method='POST', h...(truncated), tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/internal/pycore_ceval.h:46
#72 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, con=<optimized out>, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5065
#73 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:342
#74 0x000055e01480e977 in _PyObject_VectorcallTstate (kwnames=('method', 'body', 'headers'), nargsf=<optimized out>, args=<optimized out>, callable=<function at remote 0x7f16f662aef0>, tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:114
#75 PyObject_Vectorcall (kwnames=('method', 'body', 'headers'), nargsf=<optimized out>, args=<optimized out>, callable=<function at remote 0x7f16f662aef0>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:123
#76 call_function (kwnames=('method', 'body', 'headers'), oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7f16f5ae2990, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5867
#77 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:4231
#78 0x000055e0147f013d in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7f16f0024590, for file /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/googleapiclient/http.py, line 922, in execute (self=<HttpRequest(uri='https://lifesciences.googleapis.com/v2beta/projects/273727082746/locations/europe-west2/operations/266139020635888359?alt=json', method='GET', body=None, headers={'accept': 'application/json', 'accept-encoding': 'gzip, deflate', 'user-agent': '(gzip)', 'x-goog-api-client': 'gdcl/2.38.0 gl-python/3.10.2', 'content-length': '0'}, methodId='lifesciences.projects.locations.operations.get', http=<Http(proxy_info=<function at remote 0x7f16f69201f0>, ca_certs=None, disable_ssl_certificate_validation=False, tls_maximum_version=None, tls_minimum_version=None, connections={'https:oauth2.googleapis.com': <HTTPSConnectionWithTimeout(disable_ssl_certificate_validation=False, ca_certs='/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/certifi/cacert.pem', proxy_info=None, timeout=60, source_address=N...(truncated), tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/internal/pycore_ceval.h:46
#79 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, con=<optimized out>, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5065
#80 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:342
#81 0x000055e014770eaa in PyVectorcall_Call (kwargs=<optimized out>, tuple=(<HttpRequest(uri='https://lifesciences.googleapis.com/v2beta/projects/273727082746/locations/europe-west2/operations/266139020635888359?alt=json', method='GET', body=None, headers={'accept': 'application/json', 'accept-encoding': 'gzip, deflate', 'user-agent': '(gzip)', 'x-goog-api-client': 'gdcl/2.38.0 gl-python/3.10.2', 'content-length': '0'}, methodId='lifesciences.projects.locations.operations.get', http=<Http(proxy_info=<function at remote 0x7f16f69201f0>, ca_certs=None, disable_ssl_certificate_validation=False, tls_maximum_version=None, tls_minimum_version=None, connections={'https:oauth2.googleapis.com': <HTTPSConnectionWithTimeout(disable_ssl_certificate_validation=False, ca_certs='/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/certifi/cacert.pem', proxy_info=None, timeout=60, source_address=None, blocksize=8192, sock=<SSLSocket(_context=<SSLContext at remote 0x7f16f5bc8340>, _session=None, _sslobj=<_ssl._SSLSocket at remote 0x7f16f5b7e340>, server_side=False,...(truncated), callable=<function at remote 0x7f16f664c820>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:255
#82 _PyObject_Call (kwargs=<optimized out>, args=(<HttpRequest(uri='https://lifesciences.googleapis.com/v2beta/projects/273727082746/locations/europe-west2/operations/266139020635888359?alt=json', method='GET', body=None, headers={'accept': 'application/json', 'accept-encoding': 'gzip, deflate', 'user-agent': '(gzip)', 'x-goog-api-client': 'gdcl/2.38.0 gl-python/3.10.2', 'content-length': '0'}, methodId='lifesciences.projects.locations.operations.get', http=<Http(proxy_info=<function at remote 0x7f16f69201f0>, ca_certs=None, disable_ssl_certificate_validation=False, tls_maximum_version=None, tls_minimum_version=None, connections={'https:oauth2.googleapis.com': <HTTPSConnectionWithTimeout(disable_ssl_certificate_validation=False, ca_certs='/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/certifi/cacert.pem', proxy_info=None, timeout=60, source_address=None, blocksize=8192, sock=<SSLSocket(_context=<SSLContext at remote 0x7f16f5bc8340>, _session=None, _sslobj=<_ssl._SSLSocket at remote 0x7f16f5b7e340>, server_side=False,...(truncated), callable=<function at remote 0x7f16f664c820>, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:290
#83 PyObject_Call (callable=<function at remote 0x7f16f664c820>, args=(<HttpRequest(uri='https://lifesciences.googleapis.com/v2beta/projects/273727082746/locations/europe-west2/operations/266139020635888359?alt=json', method='GET', body=None, headers={'accept': 'application/json', 'accept-encoding': 'gzip, deflate', 'user-agent': '(gzip)', 'x-goog-api-client': 'gdcl/2.38.0 gl-python/3.10.2', 'content-length': '0'}, methodId='lifesciences.projects.locations.operations.get', http=<Http(proxy_info=<function at remote 0x7f16f69201f0>, ca_certs=None, disable_ssl_certificate_validation=False, tls_maximum_version=None, tls_minimum_version=None, connections={'https:oauth2.googleapis.com': <HTTPSConnectionWithTimeout(disable_ssl_certificate_validation=False, ca_certs='/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/certifi/cacert.pem', proxy_info=None, timeout=60, source_address=None, blocksize=8192, sock=<SSLSocket(_context=<SSLContext at remote 0x7f16f5bc8340>, _session=None, _sslobj=<_ssl._SSLSocket at remote 0x7f16f5b7e340>, server_side=False,...(truncated), kwargs=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:317
#84 0x000055e0148103ce in do_call_core (kwdict={}, callargs=(<HttpRequest(uri='https://lifesciences.googleapis.com/v2beta/projects/273727082746/locations/europe-west2/operations/266139020635888359?alt=json', method='GET', body=None, headers={'accept': 'application/json', 'accept-encoding': 'gzip, deflate', 'user-agent': '(gzip)', 'x-goog-api-client': 'gdcl/2.38.0 gl-python/3.10.2', 'content-length': '0'}, methodId='lifesciences.projects.locations.operations.get', http=<Http(proxy_info=<function at remote 0x7f16f69201f0>, ca_certs=None, disable_ssl_certificate_validation=False, tls_maximum_version=None, tls_minimum_version=None, connections={'https:oauth2.googleapis.com': <HTTPSConnectionWithTimeout(disable_ssl_certificate_validation=False, ca_certs='/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/certifi/cacert.pem', proxy_info=None, timeout=60, source_address=None, blocksize=8192, sock=<SSLSocket(_context=<SSLContext at remote 0x7f16f5bc8340>, _session=None, _sslobj=<_ssl._SSLSocket at remote 0x7f16f5b7e340>, server_side=False,...(truncated), func=<function at remote 0x7f16f664c820>, trace_info=0x7f16f5ae2c40, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5919
#85 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:4277
#86 0x000055e0147f08e5 in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7f16f61707c0, for file /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/googleapiclient/_helpers.py, line 131, in positional_wrapper (args=(<HttpRequest(uri='https://lifesciences.googleapis.com/v2beta/projects/273727082746/locations/europe-west2/operations/266139020635888359?alt=json', method='GET', body=None, headers={'accept': 'application/json', 'accept-encoding': 'gzip, deflate', 'user-agent': '(gzip)', 'x-goog-api-client': 'gdcl/2.38.0 gl-python/3.10.2', 'content-length': '0'}, methodId='lifesciences.projects.locations.operations.get', http=<Http(proxy_info=<function at remote 0x7f16f69201f0>, ca_certs=None, disable_ssl_certificate_validation=False, tls_maximum_version=None, tls_minimum_version=None, connections={'https:oauth2.googleapis.com': <HTTPSConnectionWithTimeout(disable_ssl_certificate_validation=False, ca_certs='/home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/certifi/cacert.pem', proxy_info=None, timeout=60, ...(truncated), tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/internal/pycore_ceval.h:46
#87 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, con=<optimized out>, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5065
#88 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:342
#89 0x000055e01480dd2b in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f61221e0, callable=<function at remote 0x7f16f664c8b0>, tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:114
#90 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f61221e0, callable=<function at remote 0x7f16f664c8b0>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:123
#91 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7f16f5ae2e90, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5867
#92 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:4198
#93 0x000055e0147f013d in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7f16f6122040, for file /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/snakemake-7.0.1+0.gccddf985.dirty-py3.10.egg/snakemake/executors/google_lifesciences.py, line 928, in _retry_request (self=<GoogleLifeSciencesExecutor(workflow=<Workflow(global_resources={'_cores': None, '_nodes': 20}, _rules={'all': <Rule(name='all', workflow=<...>, docstring=None, message=None, _input=<InputFiles(_names={}, _allowed_overrides=['index', 'sort'], index=<functools.partial at remote 0x7f16f6d5b1f0>, sort=<functools.partial at remote 0x7f16f6d5b880>) at remote 0x7f16f719ecf0>, _output=<OutputFiles(_names={}, _allowed_overrides=['index', 'sort'], index=<functools.partial at remote 0x7f16f6d5b920>, sort=<functools.partial at remote 0x7f16f6d5b9c0>) at remote 0x7f16f6d5b8d0>, _params=<Params(_names={}, _allowed_overrides=['index', 'sort'], index=<functools.partial at remote 0x7f16f6d5ba10>, sort=<functools.partial at remote 0x7f16f6d5bab0>) at remote 0x7f16f6d5ba60>, _wildcard_cons...(truncated), tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/internal/pycore_ceval.h:46
#94 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, con=<optimized out>, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5065
#95 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:342
#96 0x000055e01480dd2b in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f00011a0, callable=<function at remote 0x7f16f9089a20>, tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:114
#97 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f00011a0, callable=<function at remote 0x7f16f9089a20>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:123
#98 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7f16f5ae30e0, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5867
#99 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:4198
#100 0x000055e0147f013d in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7f16f0000fe0, for file /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/snakemake-7.0.1+0.gccddf985.dirty-py3.10.egg/snakemake/executors/google_lifesciences.py, line 979, in _wait_for_jobs (self=<GoogleLifeSciencesExecutor(workflow=<Workflow(global_resources={'_cores': None, '_nodes': 20}, _rules={'all': <Rule(name='all', workflow=<...>, docstring=None, message=None, _input=<InputFiles(_names={}, _allowed_overrides=['index', 'sort'], index=<functools.partial at remote 0x7f16f6d5b1f0>, sort=<functools.partial at remote 0x7f16f6d5b880>) at remote 0x7f16f719ecf0>, _output=<OutputFiles(_names={}, _allowed_overrides=['index', 'sort'], index=<functools.partial at remote 0x7f16f6d5b920>, sort=<functools.partial at remote 0x7f16f6d5b9c0>) at remote 0x7f16f6d5b8d0>, _params=<Params(_names={}, _allowed_overrides=['index', 'sort'], index=<functools.partial at remote 0x7f16f6d5ba10>, sort=<functools.partial at remote 0x7f16f6d5bab0>) at remote 0x7f16f6d5ba60>, _wildcard_cons...(truncated), tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/internal/pycore_ceval.h:46
#101 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, con=<optimized out>, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5065
#102 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:342
#103 0x000055e01480dd2b in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f6080758, callable=<function at remote 0x7f16f9089ab0>, tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:114
#104 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f6080758, callable=<function at remote 0x7f16f9089ab0>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:123
#105 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7f16f5ae3330, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5867
#106 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:4198
#107 0x000055e0147ef139 in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7f16f60805e0, for file /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/site-packages/snakemake-7.0.1+0.gccddf985.dirty-py3.10.egg/snakemake/executors/__init__.py, line 756, in _wait_thread (self=<GoogleLifeSciencesExecutor(workflow=<Workflow(global_resources={'_cores': None, '_nodes': 20}, _rules={'all': <Rule(name='all', workflow=<...>, docstring=None, message=None, _input=<InputFiles(_names={}, _allowed_overrides=['index', 'sort'], index=<functools.partial at remote 0x7f16f6d5b1f0>, sort=<functools.partial at remote 0x7f16f6d5b880>) at remote 0x7f16f719ecf0>, _output=<OutputFiles(_names={}, _allowed_overrides=['index', 'sort'], index=<functools.partial at remote 0x7f16f6d5b920>, sort=<functools.partial at remote 0x7f16f6d5b9c0>) at remote 0x7f16f6d5b8d0>, _params=<Params(_names={}, _allowed_overrides=['index', 'sort'], index=<functools.partial at remote 0x7f16f6d5ba10>, sort=<functools.partial at remote 0x7f16f6d5bab0>) at remote 0x7f16f6d5ba60>, _wildcard_constraints={}, d...(truncated), tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/internal/pycore_ceval.h:46
#108 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=1, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5065
#109 0x000055e0147f1e44 in _PyFunction_Vectorcall (kwnames=0x0, nargsf=1, stack=0x7f16f5ae3570, func=<function at remote 0x7f16f907ce50>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:342
#110 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=1, args=0x7f16f5ae3570, callable=<function at remote 0x7f16f907ce50>, tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:114
#111 method_vectorcall (method=<optimized out>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/classobject.c:61
#112 0x000055e014770eaa in PyVectorcall_Call (kwargs=<optimized out>, tuple=(), callable=<method at remote 0x7f16f6652040>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:255
#113 _PyObject_Call (kwargs=<optimized out>, args=(), callable=<method at remote 0x7f16f6652040>, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:290
#114 PyObject_Call (callable=<method at remote 0x7f16f6652040>, args=(), kwargs=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:317
#115 0x000055e0148103ce in do_call_core (kwdict={}, callargs=(), func=<method at remote 0x7f16f6652040>, trace_info=0x7f16f5ae3700, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5919
#116 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:4277
#117 0x000055e0147f013d in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7f16f5bd9540, for file /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/threading.py, line 946, in run (self=<Thread(_target=<method at remote 0x7f16f6652040>, _name='Thread-1 (_wait_thread)', _args=(), _kwargs={}, _daemonic=True, _ident=139736587847424, _native_id=8318, _tstate_lock=<_thread.lock at remote 0x7f16f5bcf900>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f16f6651f80>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f16f6651f80>, release=<built-in method release of _thread.lock object at remote 0x7f16f6651f80>, _waiters=<collections.deque at remote 0x7f16f5b86680>) at remote 0x7f16f6662f50>, _flag=True) at remote 0x7f16f6663280>, _is_stopped=False, _initialized=True, _stderr=<_io.TextIOWrapper at remote 0x7f16fa871150>, _invoke_excepthook=<function at remote 0x7f16f5b8e680>) at remote 0x7f16f6662e30>), tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/internal/pycore_ceval.h:46
#118 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, con=<optimized out>, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5065
#119 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:342
#120 0x000055e01480dd2b in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f6df9ec0, callable=<function at remote 0x7f16fa4c4940>, tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:114
#121 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f6df9ec0, callable=<function at remote 0x7f16fa4c4940>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:123
#122 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7f16f5ae3950, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5867
#123 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:4198
#124 0x000055e0147f013d in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7f16f6df9d50, for file /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/threading.py, line 1009, in _bootstrap_inner (self=<Thread(_target=<method at remote 0x7f16f6652040>, _name='Thread-1 (_wait_thread)', _args=(), _kwargs={}, _daemonic=True, _ident=139736587847424, _native_id=8318, _tstate_lock=<_thread.lock at remote 0x7f16f5bcf900>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f16f6651f80>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f16f6651f80>, release=<built-in method release of _thread.lock object at remote 0x7f16f6651f80>, _waiters=<collections.deque at remote 0x7f16f5b86680>) at remote 0x7f16f6662f50>, _flag=True) at remote 0x7f16f6663280>, _is_stopped=False, _initialized=True, _stderr=<_io.TextIOWrapper at remote 0x7f16fa871150>, _invoke_excepthook=<function at remote 0x7f16f5b8e680>) at remote 0x7f16f6662e30>), tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/internal/pycore_ceval.h:46
#125 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, con=<optimized out>, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5065
#126 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:342
#127 0x000055e01480dd2b in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f6084360, callable=<function at remote 0x7f16fa4c4c10>, tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:114
#128 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f16f6084360, callable=<function at remote 0x7f16fa4c4c10>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:123
#129 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7f16f5ae3ba0, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5867
#130 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:4198
#131 0x000055e0147ef139 in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7f16f60841f0, for file /home/cameron.cowan/programs/anaconda3/envs/snakemake_test/lib/python3.10/threading.py, line 966, in _bootstrap (self=<Thread(_target=<method at remote 0x7f16f6652040>, _name='Thread-1 (_wait_thread)', _args=(), _kwargs={}, _daemonic=True, _ident=139736587847424, _native_id=8318, _tstate_lock=<_thread.lock at remote 0x7f16f5bcf900>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f16f6651f80>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f16f6651f80>, release=<built-in method release of _thread.lock object at remote 0x7f16f6651f80>, _waiters=<collections.deque at remote 0x7f16f5b86680>) at remote 0x7f16f6662f50>, _flag=True) at remote 0x7f16f6663280>, _is_stopped=False, _initialized=True, _stderr=<_io.TextIOWrapper at remote 0x7f16fa871150>, _invoke_excepthook=<function at remote 0x7f16f5b8e680>) at remote 0x7f16f6662e30>), tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/internal/pycore_ceval.h:46
#132 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=1, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/ceval.c:5065
#133 0x000055e0147f1e44 in _PyFunction_Vectorcall (kwnames=0x0, nargsf=1, stack=0x7f16f5ae3de0, func=<function at remote 0x7f16fa4c49d0>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:342
#134 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=1, args=0x7f16f5ae3de0, callable=<function at remote 0x7f16fa4c49d0>, tstate=0x55e016b111c0) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Include/cpython/abstract.h:114
#135 method_vectorcall (method=<optimized out>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/classobject.c:61
#136 0x000055e014770eaa in PyVectorcall_Call (kwargs=<optimized out>, tuple=(), callable=<method at remote 0x7f16f6651e40>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:255
#137 _PyObject_Call (kwargs=<optimized out>, args=(), callable=<method at remote 0x7f16f6651e40>, tstate=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:290
#138 PyObject_Call (callable=<method at remote 0x7f16f6651e40>, args=(), kwargs=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Objects/call.c:317
#139 0x000055e0148a7695 in thread_run (boot_raw=0x7f16f6662130) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Modules/_threadmodule.c:1090
#140 0x000055e014843a34 in pythread_wrapper (arg=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1643743073903/work/Python/thread_pthread.h:248
#141 0x00007f16faf48609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#142 0x00007f16fad08293 in clone () from /lib/x86_64-linux-gnu/libc.so.6

New to gdb so let me know if other outputs would be helpful.

@vsoch
Copy link
Contributor

vsoch commented Mar 2, 2022

So it looks like it's happening when we start with the GLS run() function and create the operation data structure, and then send that to retry request, which is basically trying to submit that to the GLS API. Question - does this still trigger if you don't include the flag for more than one job / control the threading? I'm wondering if the issue is just that we shouldn't be trying to run the local commands this many times at once, and instead try to add the scaling to a single / smaller number of API calls? There do seem to be a number of issues with controlling ssl connections via multiple threads that seem similar.

@CowanCS1
Copy link
Contributor

CowanCS1 commented Mar 2, 2022

I agree, regardless of the exact error it is always when the job is sent to the the GLS api.

In my experience, it only errors when running multiple jobs. I ran a limited test of that with 100 jobs using -j 1, and didn't get any errors. A bit biased since I've run so many more jobs with j>1, but will try to reproduce it a few more times. How is it for you @cademirch ?

@cademirch
Copy link
Contributor Author

cademirch commented Mar 2, 2022

@CowanCS1 Running the MRE with -j 1 right now and it has not failed yet. Thanks for running the gdb btw.

@cademirch
Copy link
Contributor Author

So it looks like it's happening when we start with the GLS run() function and create the operation data structure, and then send that to retry request, which is basically trying to submit that to the GLS API. Question - does this still trigger if you don't include the flag for more than one job / control the threading? I'm wondering if the issue is just that we shouldn't be trying to run the local commands this many times at once, and instead try to add the scaling to a single / smaller number of API calls? There do seem to be a number of issues with controlling ssl connections via multiple threads that seem similar.

@vsoch Where could I make this change you're suggesting? I'm happy to work on that today and test it out.

@CowanCS1
Copy link
Contributor

CowanCS1 commented Mar 2, 2022

I put in some additional logging, and am seeing more than a few generic exceptions coming out of _retry_request which are from a pretty wide variety of errors such as ssl.SSLError: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2536) and ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2536). Most of the time these are handled safely, and segmentation faults can be with them or without.

@vsoch could this be a thread safety issue? I saw httplib2 isn't thread safe and read google's recommended approach for resolving that. I see you're using some locking, but I don't know the broader executor classes well enough to see exactly how it is all put together. Given the issues, I thought it is still worth considering.

Regarding your idea to do fewer API calls, also see this idea for maintaining a pool of httplib2 objects. Might be easier to incorporate with the current code than waiting to bundle a series of requests.

@vsoch
Copy link
Contributor

vsoch commented Mar 2, 2022

Sorry you beat me to responding! So I was just looking at similar - and I'd say try those approaches first that refactor to enable thread safely. Other methods would change the way that the client works - and if you need more than that approach you could also try testing setting some custom values for GLS of:

  • max_jobs_per_second (int): maximal number of cluster/drmaa jobs per second, None to impose no limit (default None)
  • there indeed is a bit of messing with max jobs, here
    if args.cores is not None:
    args.cores = parse_cores(args.cores)
    if local_exec:
    # avoid people accidentally setting jobs as well
    args.jobs = None
    else:
    if no_exec:
    args.cores = 1
    elif local_exec:
    if args.jobs is not None:
    args.cores = parse_cores(args.jobs)
    args.jobs = None
    elif args.dryrun:
    # dryrun with single core if nothing specified
    args.cores = 1
    else:
    print(
    "Error: you need to specify the maximum number of CPU cores to "
    "be used at the same time. If you want to use N cores, say --cores N or "
    "-cN. For all cores on your system (be sure that this is appropriate) "
    "use --cores all. For no parallelization use --cores 1 or -c1.",
    file=sys.stderr,
    )
    sys.exit(1)
    if non_local_exec:
    if args.jobs is None:
    print(
    "Error: you need to specify the maximum number of jobs to "
    "be queued or executed at the same time with --jobs or -j.",
    file=sys.stderr,
    )
    sys.exit(1)
    else:
    try:
    args.jobs = int(args.jobs)
    except ValueError:
    print(
    "Error parsing number of jobs (--jobs, -j): must be integer.",
    file=sys.stderr,
    )
  • this function
    def run_jobs(self, jobs, callback=None, submit_callback=None, error_callback=None):
    """Run a list of jobs that is ready at a given point in time.
    By default, this method just runs each job individually.
    This method can be overwritten to submit many jobs in a more efficient way than one-by-one.
    Note that in any case, for each job, the callback functions have to be called individually!
    """
    for job in jobs:
    self.run(
    job,
    callback=callback,
    submit_callback=submit_callback,
    error_callback=error_callback,
    )
    is running jobs so you could put a wait or other control there

it's been a while since I've traced the execution but I think we have a few good ideas to start! Thanks for offering to take a shot at fixing this @CowanCS1 !

@cademirch
Copy link
Contributor Author

So I tried a rather naive solution based on the Google docs @CowanCS1 linked above. With this, I have been able to run the MRE with -j10 successfully a few times. Currently testing more. I'm sure this is not an optimal solution and to be honest don't have the greatest understanding of why this works, so please let me know what should be changed. I know that generating the credentials should be probably be moved out of this function, but beyond that I'm unsure of where to improve. See #1451

@vsoch
Copy link
Contributor

vsoch commented Mar 2, 2022

@cademirch this is a great step in the right direction! I suspect there are best practices and I've pinged @moschetti to advise us. This has been a problem for years and we haven't fixed it so even a "this is a hacky" solution I'm pretty pumped about <3

@CowanCS1
Copy link
Contributor

CowanCS1 commented Mar 3, 2022

Thanks @cademirch :)

I've tested this version, and can confirm that it eliminates the SIGSEGV, all of the exotic SIGABRT errors, and all of the SSL exceptions. Nice!

Unexpectedly, it is also eliminating another issue I was seeing with this test script where a subset of output files (5-10%) were either not present in cloud storage or present but not recognized by the job. I saw the latter type of MissingOutputExceptions frequently in my own pipeline. With this version, all of the jobs are present every time.

That's actually all of the issues I was tracking, so hurrah! 👍

@vsoch Thanks for reaching out to get advice - this version is probably creating 10-20 connections per each of these quick jobs, but that would increase linearly with time due to the status checks. I hesitate to go fully into implementing a pool of connections, since I could imagine some edge cases like stale connections which we'd have to handle for minimal benefit compared to simpler solutions. My currently favored compromise is to create a single http connection for each call to _run and then maintain one for _wait_jobs, which would reduce the intial connection count 5-10x and eliminate the scaling with time. Since this version is working fine, I'll wait to implement anything until you get feedback.

Cheers all

@vsoch
Copy link
Contributor

vsoch commented Mar 3, 2022

woohoo! So my best guess is that when we create the discovery clients the request objects aren't getting / inheriting the correct scopes, or if they are, the scopes are perhaps not consistent to authenticate everything (e.g., sending to storage?) I think I'd consider this a googleapiclient bug - if I create a discovery service to use an API my expectation would be that it generates whatever http is needed, especially given the google default application credentials are the auth attached (which I believe to be the same generated by google.auth.default). ping @moschetti in case he can take this to the developers and get some insight!

@cademirch
Copy link
Contributor Author

@CowanCS1 @vsoch Glad I can help! Thank you both for your expertise and help with this.

I agree that while this version works, it is not optimal since it creates so many http objects. I do like @CowanCS1 suggestions of creating the http connection in _run and _wait_jobs, I'll look into implementing and testing that today.

@vsoch
Copy link
Contributor

vsoch commented Mar 3, 2022

Sounds good! And check out the service objects created by the discovery API, one for each request, to see if you can grab an already created http object / credentials from there. I will say again it's strange that the library doesn't carry those forward to the request.

@cademirch
Copy link
Contributor Author

Yes it is indeed strange. I'm curious if building the api like suggested here may be helpful.

@vsoch
Copy link
Contributor

vsoch commented Mar 3, 2022

Yes! I was thinking we could try something like that - perhaps creating one build request for all the discovery services?

@cademirch
Copy link
Contributor Author

Sounds good, I'll give it a go. Should I make a new branch for this? Or continue on my one with the PR (sorry am new to contributing).

@vsoch
Copy link
Contributor

vsoch commented Mar 3, 2022

Please continue updating the PR! When you push new commits the CI will run again, and you can do that generally until you are content. Some communities are very strict about squash and merge, and require you to rebase or do other kind of git fu, but here we are good with squash and merge so contributing is easy.

To take a quote from the movie "Signs" and mangle it a bit... push away Cade, push awaaay! :D

@CowanCS1
Copy link
Contributor

CowanCS1 commented Mar 3, 2022

@cademirch Have you tried the updated code with a more complex pipeline yet? I moved my main project over to the same virtual environment that is working perfectly for the test example and am seeing a very frequent MissingOutputException on most if not all jobs. The files appear to be produced and saved in GS, just not detected and therefore they're marked as incomplete by snakemake. Before the fix I would occasionally get this behavior in the main pipeline, but only on maybe 10% of jobs.

Given that I observed a reduction in this type of error for the test example provided above, it isn't unthinkable that the authentication to cloud storage is somehow negatively impacted when the jobs are longer or more complex. Tried to reproduce it by adding some delays to the minimal example, but no dice.

@vsoch
Copy link
Contributor

vsoch commented Mar 3, 2022

Does one of the services looking for them have scope for storage?

@cademirch
Copy link
Contributor Author

@CowanCS1 I have not. I think you are right that it has to do with the authentication - I'm curious if it has to do with the fact that mew credentials are being generated in every call of _retry_request (a lazy implementation on my part).

This missingoutput behavior is frustrating. I've seen it about the same frequency as you in my --kubernetes executions.

@cademirch
Copy link
Contributor Author

I'm working on a different solution and will update my PR. Will also test on a more complex workflow and report back.

@cademirch
Copy link
Contributor Author

cademirch commented Mar 3, 2022

Does one of the services looking for them have scope for storage?

I'm confused about this since storage seems to be handled separately from the other services?

self._bucket_service = storage.Client()

Edit:
Unless google.cloud.storage is using the generated credentials?

@vsoch
Copy link
Contributor

vsoch commented Mar 3, 2022

That is my doing - the only logic is that the storage client directly worked really well (vs the other apis didn't have established relible clients). I think the scopes are here: https://github.com/googleapis/python-storage/blob/c6bf78d608832d69031cccb7e4f252dd6241ade7/google/cloud/storage/client.py#L100-L103 and akin to the others, the credentials are found in the environment, mentioned here https://github.com/googleapis/python-storage/blob/c6bf78d608832d69031cccb7e4f252dd6241ade7/google/cloud/storage/client.py#L76-L77 and under the hood (the base class of storage.Client) it's doing the same thing https://github.com/googleapis/python-cloud-core/blob/24b7de49943a49e8235f9dbee6b32693deed8c1f/google/cloud/client/__init__.py#L178

@CowanCS1
Copy link
Contributor

CowanCS1 commented Mar 3, 2022

I agree, very bizarre. Very confused as to why I can't replicate it in the simple case.

@vsoch Not certain. There is an old pull request in python's cloud storage api where they celebrate taking httplib2 out and replacing it with requests which would make any cross-talk even less likely. I'm also not seeing any of the SSL errors or signal events like we were in that case, so it seems less likely to be the kind of nasty overlapping scope thread safety issue we just fixed. Maybe the checks are just destructively interfering, somehow? Will have to refresh my memory of how these checks are being made.

To completely rule it out, I guess we can try the same approach of creating and providing an independent http connection for the cloud storage api. It's provided as a requests.Session object as _http which you can see here.

Bedtime on this side of the atlantic :)

@cademirch
Copy link
Contributor Author

That is my doing - the only logic is that the storage client directly worked really well (vs the other apis didn't have established relible clients). I think the scopes are here: https://github.com/googleapis/python-storage/blob/c6bf78d608832d69031cccb7e4f252dd6241ade7/google/cloud/storage/client.py#L100-L103 and akin to the others, the credentials are found in the environment, mentioned here https://github.com/googleapis/python-storage/blob/c6bf78d608832d69031cccb7e4f252dd6241ade7/google/cloud/storage/client.py#L76-L77 and under the hood (the base class of storage.Client) it's doing the same thing https://github.com/googleapis/python-cloud-core/blob/24b7de49943a49e8235f9dbee6b32693deed8c1f/google/cloud/client/__init__.py#L178

Yea it totally makes sense to use the storage client. So it does seem that the storage client generates its own credentials from the environment. I guess it could be crosstalk? But seems unlikely as @CowanCS1 mentions. Gonna push some commits and test on a larger workflow and see if I can replicate.

@vsoch
Copy link
Contributor

vsoch commented Mar 3, 2022

haha yes good observations indeed! I absolutely love using Google Cloud but the APIs are constantly moving targets and there are many ways to do the same thing. I try to choose and make the best decision for the time, but I suspect this also changes over time.

@CowanCS1
Copy link
Contributor

CowanCS1 commented Mar 3, 2022

My previous report was based on continuing a long (~1000 jobs) ongoing run, which had been failing consistently prior to the fixes here (~400 jobs in and was still slowly making progress on it with the old version). I made a duplicate of the data set and cleaned out all of the in-progress outputs, and now it appears to be working without any MissingOutputExceptions. I also changed to a single region bucket from a multiregion, but I have tested it both ways with the minimal example without any issues.

So my working theory is that all of the errors from the previous version somehow left it in a weird state, which was somehow causing issues with the new version. I'll let it run overnight, but wanted to let you know that the existing cloud storage API may be OK.

@cademirch
Copy link
Contributor Author

@CowanCS1 Interesting. Thank you for noting this. I've had issues with MissingOutputExceptions despite the files being present, see #1405.

Interestingly when I run with -T 3, on every reattempt of the same job it will also fail with MissingOutput over and over until the retry times are exceeded. I'm not sure how related this is to this issue though.

@cademirch
Copy link
Contributor Author

@CowanCS1 Also unrelated, but could you comment on how long it takes for the DAG to be built for your larger, more complex workflows when using --default-remote-prefix? For me it takes about 15-20 minutes which feels longer than it should be. I've reasoned to myself that snakemake probably takes a while to check the presence/absence of all the input/output files in the bucket but I'm not sure if that is why it takes so long for DAG building.

@CowanCS1
Copy link
Contributor

CowanCS1 commented Mar 4, 2022

Pipeline I described in my last post is still running, over 200 jobs with no crashes and the only missing files exceptions had a known cause.

@CowanCS1 Interesting. Thank you for noting this. I've had issues with MissingOutputExceptions despite the files being present, see #1405.
Interestingly when I run with -T 3, on every reattempt of the same job it will also fail with MissingOutput over and over until the retry times are exceeded. I'm not sure how related this is to this issue though.

That closely matches what I was seeing earlier. Rather than -T I was using -k --rerun-incomplete and it appeared to be in an infinite loop, but I did try a version using --preemptible-default 3 which reproduced what you describe exactly. I would suspect the common thread is GS.py.

I do have a few observations on things that are going wrong with GS, but I haven't verified these in isolation through minimal examples

  • When an output file is already present in GS, but snakemake doesn't expect it to be present (never attempted a run of that output) it first doesn't recognize that the file exists, runs the corresponding rule seemingly without issue, then believes it has failed to upload the file and attempts to delete it, this deletion also fails to occur, we hit a missing output exception, and the file is still present. Upon a subsequent run it sees the file, but believes it is from an incomplete run and restarts the above process. Since the file is still there, the failure to upload/download repeats.
  • If you have a dirty shutdown the cloud jobs will persist, creating files which are uploaded to cloud storage. These then get you into the issue in the first point. Dirty shutdowns can be hit, for example, by a SIGSEGV like this thread is based on or by accidentally hitting ctrl+c more than once such that you abort the normal shutdown process.
  • The chaos I was seeing on my main pipeline with the new version was due to the segmentation faults creating a lot of files in this manner, and then me using --preemption-default 5 such that any MissingOutputException would immediately attempt a re-launch of the same process.
  • If you manually delete the files, it will get you out of this loop and everything will run without issue. If Snakemake creates and uploads the files as it runs the rule, everything goes smoothly.
  • The above doesn't explain why fixing the httplib2 thread safety reduced the number of MissingOutputExceptions I was seeing in the test case for this issue, because those were always run in a bucket empty of output files.

@vsoch @johanneskoester if you want I can try to create a minimal example of the above and put it in a new issue. Given that I can circumvent the problem by just cleaning the output directories and re-running, and the present issued being fixed has removed the primary entry point to this little cascade of events, tracking down the underlying bug isn't high on my priority list.

@CowanCS1
Copy link
Contributor

CowanCS1 commented Mar 4, 2022

@CowanCS1 Also unrelated, but could you comment on how long it takes for the DAG to be built for your larger, more complex workflows when using --default-remote-prefix? For me it takes about 15-20 minutes which feels longer than it should be. I've reasoned to myself that snakemake probably takes a while to check the presence/absence of all the input/output files in the bucket but I'm not sure if that is why it takes so long for DAG building.

~5 minutes. I've seen what you're describing on a previous version of our pipeline, it was 15-20 minutes for about 600 jobs on 120 inputs. I've built my own file inventories from cloud storage for a superset of what I'm analyzing here, and the inventory was done within seconds. So I suspect it is the calculation of the DAG itself which is time consuming. Are you using wildcards heavily? Our speed-up was part of a larger re-factoring, so I can't 100% attribute it to specific changes, but replacing wildcards with python scripts that read from tsv files is one change I suspect was helpful.

@vsoch
Copy link
Contributor

vsoch commented Mar 4, 2022

Oh yuck. So some thoughts:

When an output file is already present in GS, but snakemake doesn't expect it to be present (never attempted a run of that output) it first doesn't recognize that the file exists, runs the corresponding rule seemingly without issue, then believes it has failed to upload the file and attempts to delete it, this deletion also fails to occur, we hit a missing output exception, and the file is still present. Upon a subsequent run it sees the file, but believes it is from an incomplete run and restarts the above process. Since the file is still there, the failure to upload/download repeats.

This makes sense in a way because when you look at the GS upload function:

@retry.Retry(predicate=google_cloud_retry_predicate)
def _upload(self):
try:
if not self.bucket.exists():
self.bucket.create()
self.update_blob()
# Distinguish between single file, and folder
f = self.local_file()
if os.path.isdir(f):
# Ensure the "directory" exists
self.blob.upload_from_string(
"", content_type="application/x-www-form-urlencoded;charset=UTF-8"
)
for root, _, files in os.walk(f):
for filename in files:
filename = os.path.join(root, filename)
bucket_path = filename.lstrip(self.bucket.name).lstrip("/")
blob = self.bucket.blob(bucket_path)
blob.upload_from_filename(filename)
else:
self.blob.upload_from_filename(f)
except google.cloud.exceptions.Forbidden as e:
raise WorkflowError(
e,
"When running locally, make sure that you are authenticated "
"via gcloud (see Snakemake documentation). When running in a "
"kubernetes cluster, make sure that storage-rw is added to "
"--scopes (see Snakemake documentation).",
we don't make any effort to check for existence first. I think we'd probably do this at the onset when we create the cache, but that's mostly to determine what needs to be run and probably is not accounting for intermediate files. I wonder if the fix is as simple as to either check for existence first and then react (either deleting and then uploading) or trying to catch some kind of "object exists" exception in the retry wrapper and then having whatever correct response is to not continue retrying (and then maybe ultimately fail).
def google_cloud_retry_predicate(ex):
"""Given an exception from Google Cloud, determine if it's one in the
listing of transient errors (determined by function
google.api_core.retry.if_transient_error(exception)) or determine if
triggered by a hash mismatch due to a bad download. This function will
return a boolean to indicate if retry should be done, and is typically
used with the google.api_core.retry.Retry as a decorator (predicate).
Arguments:
ex (Exception) : the exception passed from the decorated function
Returns: boolean to indicate doing retry (True) or not (False)
"""
from requests.exceptions import ReadTimeout
# Most likely case is Google API transient error.
if retry.if_transient_error(ex):
return True
# Timeouts should be considered for retry as well.
if isinstance(ex, ReadTimeout):
return True
# Could also be checksum mismatch of download.
if isinstance(ex, CheckSumMismatchException):
return True
return False
. But TLDR we need to better capture this case of an output existing.

If you have a dirty shutdown the cloud jobs will persist, creating files which are uploaded to cloud storage. These then get you into the issue in the first point. Dirty shutdowns can be hit, for example, by a SIGSEGV like this thread is based on or by accidentally hitting ctrl+c more than once such that you abort the normal shutdown process.

I suspect the call to cancel a job isn't like an immediate kill, but a request (that maybe isn't listened to) so it makes sense they keep on running and create outputs.

request = operations.cancel(name=job.jobname)
. if you are reporting that a complete wipe of outputs leads to the best re-run, perhaps we need some final step that waits for everything to shut down and then goes back and just deletes the entire storage root?

The chaos I was seeing on my main pipeline with the new version was due to the segmentation faults creating a lot of files in this manner, and then me using --preemption-default 5 such that any MissingOutputException would immediately attempt a re-launch of the same process.

Hmm is this something we shouldn't be allowing users to set?

virtual_machine = {
"machineType": smallest,
"labels": {"app": "snakemake"},
"bootDiskSizeGb": disk_gb + 10,
"preemptible": preemptible,
}
. Or do we need to have better control to not allow it given some particular error (do we have that level of control?)

If you manually delete the files, it will get you out of this loop and everything will run without issue. If Snakemake creates and uploads the files as it runs the rule, everything goes smoothly.

Yeah this is making me think maybe we should clear the namespace before a run like this and not try to re-create, although my understanding is that snakemake is supposed to be able to deal with existing files (re the first part of this discussion).

@vsoch @johanneskoester if you want I can try to create a minimal example of the above and put it in a new issue. Given that I can circumvent the problem by just cleaning the output directories and re-running, and the present issued being fixed has removed the primary entry point to this little cascade of events, tracking down the underlying bug isn't high on my priority list.

If you are pooped out on debugging (and we are really appreciative of the help!) I think this minimal example would be great for us and others to keep the remaining issue in mind and have a reproducer if we're able to test Thank you @CowanCS1 !.

@CowanCS1
Copy link
Contributor

CowanCS1 commented Mar 4, 2022

we don't make any effort to check for existence first. I think we'd probably do this at the onset when we create the cache, but that's mostly to determine what needs to be run and probably is not accounting for intermediate files. I wonder if the fix is as simple as to either check for existence first and then react (either deleting and then uploading) or trying to catch some kind of "object exists" exception in the retry wrapper and then having whatever correct response is to not continue retrying (and then maybe ultimately fail).

Right, those files not being included in the cache caught me off guard a bit - as you say it was my understanding that this was the expected behavior. But I hadn't purposefully provided intermediate files, so maybe this was just my misunderstanding. I think the original rationale was if we tried to run this rule, and it failed, consider any outputs of that rule to be incomplete and if we haven't run the rule, don't expect the files, which I think is a conservative but understandable approach.

I think the actual "error" that we're hitting is that when a rule attempts to upload an output file to a blob that already exists, that blob is somehow failing to be recognized by snakemake in the same way it would if the blob had not existed prior to the upload. Without weird settings like retention on the bucket, the expected behavior of blob.upload_from_filename(filename) is simply to overwrite the blob in place. Maybe it's a simple as needing to repeat the call to update_blob()? Similar to how you're doing after creating the bucket, maybe something about the blob's identity is altered such that the subsequent call to exists() which I'm assuming must happen is referencing an old location.

Likewise, that could explain why blobs are not deleted when we hit the MissingOutputException and the existing outputs are supposed to be deleted but in fact are not.

if you are reporting that a complete wipe of outputs leads to the best re-run, perhaps we need some final step that waits for everything to shut down and then goes back and just deletes the entire storage root?

Sounds dangerous, with some wild failure modes where lots of files get irretrievably deleted. Better to just fix it so overwritten files are correctly recognized.

Or do we need to have better control to not allow it given some particular error (do we have that level of control?)

This was exactly my thought - in principle the expected behavior would be to repeat on a preemption event but not on any other type of failed job as we already have a dedicated command line option for that. Not sure how clearly that is signaled by GLS, though. Right now this flag also doubles as a activate preemptible machines, so in principle you could either:

  • Keep the existing functionality, but only repeat jobs that fail due to a pre-emption event
  • Retain this option as a flag to launch preemptible VMs, but remove the failure functionality and just recommend people use -T

If you are pooped out on debugging (and we are really appreciative of the help!) I think this minimal example would be great for us and others to keep the remaining issue in mind and have a reproducer if we're able to test Thank you @CowanCS1 !.

Sure thing! I just know I have to debug downstream steps that are still on my critical path, so wanted to try and avoid leaving anyone hanging. Will start with a simple check of the blob theory I described above...

@CowanCS1
Copy link
Contributor

CowanCS1 commented Mar 4, 2022

Made some decent progress on this. Looks like I have a minimal example, and learned some conditions under which this does and does not occur. Will test a bit more and create a new issue when I next get a chance.

@cademirch
Copy link
Contributor Author

@CowanCS1 Awesome! Thank you for looking into this. Curious to see what you have found - this issue has plagued my workflows when running on Kubernetes.

@cademirch
Copy link
Contributor Author

@CowanCS1 Also unrelated, but could you comment on how long it takes for the DAG to be built for your larger, more complex workflows when using --default-remote-prefix? For me it takes about 15-20 minutes which feels longer than it should be. I've reasoned to myself that snakemake probably takes a while to check the presence/absence of all the input/output files in the bucket but I'm not sure if that is why it takes so long for DAG building.

~5 minutes. I've seen what you're describing on a previous version of our pipeline, it was 15-20 minutes for about 600 jobs on 120 inputs. I've built my own file inventories from cloud storage for a superset of what I'm analyzing here, and the inventory was done within seconds. So I suspect it is the calculation of the DAG itself which is time consuming. Are you using wildcards heavily? Our speed-up was part of a larger re-factoring, so I can't 100% attribute it to specific changes, but replacing wildcards with python scripts that read from tsv files is one change I suspect was helpful.

Okay this makes sense to me. My workflow does use quite a few wildcards so I can see how that blows up when computing the DAG. I'm curious about your method of using python scripts to read TSV files for inputs... could you share an example by chance?

@CowanCS1
Copy link
Contributor

CowanCS1 commented Mar 7, 2022

I've provided a partial reproduction of the GS bug in #1460. I think it's the same underlying issue that I've seen in more severe cases, just exacerbated by the more complex pipelines and multiple failed states from this issue resulting in a weird pipeline state.

@cademirch Happy to share. It's in a private repo but I can send a relevant snippet to your university email - is the lab listed in your github profile current?

@cademirch
Copy link
Contributor Author

@CowanCS1 I'll add my observations to that issue. And yes, you can email me at cade@ucsc.edu - thank you for that!

@cademirch
Copy link
Contributor Author

Gonna close this since #1451 was merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants