Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation expressions that timeout can raise JSONDecodeError #1999

Open
brettcs opened this issue May 13, 2024 · 0 comments
Open

Evaluation expressions that timeout can raise JSONDecodeError #1999

brettcs opened this issue May 13, 2024 · 0 comments

Comments

@brettcs
Copy link
Contributor

brettcs commented May 13, 2024

Expected Behavior

When an evaluation expression exceeds the timeout, the resulting error should be clear that is the cause.

Actual Behavior

At least some of the time, this situation raises a JSONDecodeError exception. If the user doesn't realize the timeout might be the issue, this can cause them to lose a lot of time trying to debug an issue in the expression itself where none exists.

This is sort of a sequel issue to #1680.

Workflow Code

eval-timeout.zip

Full Traceback

This issue was noticed with a "real" expression that similarly built a large array of an array of files, and hit the default evaluation timeout. However, you can reproduce the issue using the attached minimal reproduction and a small evaluation timeout:

% venv/bin/cwltool --debug --eval-timeout=.1 wf.cwl array-in.yml
INFO venv/bin/cwltool 3.1.20240404144621
INFO Resolved 'wf.cwl' to 'file:///home/brett/Scratch/wf.cwl'
DEBUG [workflow ] initialized from file:///home/brett/Scratch/wf.cwl
INFO [workflow ] start
DEBUG [workflow ] inputs {}
INFO [workflow ] starting step build_array
DEBUG [step build_array] job input {}
DEBUG [step build_array] evaluated job input to {}
INFO [step build_array] start
Expecting value: line 1 column 1 (char 0)
script was:
01 "use strict";
02 var inputs = {};
03 var self = null;
04 var runtime = {
05     "cores": 1,
06     "ram": 256,
07     "tmpdirSize": 1024,
08     "outdirSize": 1024,
09     "tmpdir": null,
10     "outdir": null
11 };
12 (function(){
13  var result = [];
14  for (var i=0; i<100000; i++) {
15    var arr = [];
16    result.push(arr);
17    for (var j=0; j<100000; j++) {
18      var num = i * 100000 + j;
19      arr.push({
20        class: "File",
21        location: "file:///tmp/array.cwl/file" + num.toString() + ".fq",
22        format: "http://edamontology.org/format_1930"
23      });
24    }
25  }
26  return {"out_arr": result};
27 })()
stdout was: ''
stderr was: ''
Traceback (most recent call last):
  File "/home/brett/Scratch/venv/lib/python3.12/site-packages/cwl_utils/sandboxjs.py", line 512, in eval
    return cast(CWLOutputType, json.loads(stdout))
                               ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

[Note: Everything from here on is basically propagation of the same root cause exception, but here it is for completeness:]

Traceback (most recent call last):
  File "/home/brett/Scratch/venv/lib/python3.12/site-packages/cwl_utils/expression.py", line 315, in do_eval
    return interpolate(
           ^^^^^^^^^^^^
  File "/home/brett/Scratch/venv/lib/python3.12/site-packages/cwl_utils/expression.py", line 215, in interpolate
    e = evaluator(
        ^^^^^^^^^^
  File "/home/brett/Scratch/venv/lib/python3.12/site-packages/cwl_utils/expression.py", line 161, in evaluator
    return cast(CWLOutputType, js_engine.eval(ex, jslib, **kwargs))
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/brett/Scratch/venv/lib/python3.12/site-packages/cwl_utils/sandboxjs.py", line 514, in eval
    raise JavascriptException(
cwl_utils.errors.JavascriptException: Expecting value: line 1 column 1 (char 0)
script was:
01 "use strict";
02 var inputs = {};
03 var self = null;
04 var runtime = {
05     "cores": 1,
06     "ram": 256,
07     "tmpdirSize": 1024,
08     "outdirSize": 1024,
09     "tmpdir": null,
10     "outdir": null
11 };
12 (function(){
13  var result = [];
14  for (var i=0; i<100000; i++) {
15    var arr = [];
16    result.push(arr);
17    for (var j=0; j<100000; j++) {
18      var num = i * 100000 + j;
19      arr.push({
20        class: "File",
21        location: "file:///tmp/array.cwl/file" + num.toString() + ".fq",
22        format: "http://edamontology.org/format_1930"
23      });
24    }
25  }
26  return {"out_arr": result};
27 })()
stdout was: ''
stderr was: ''

ERROR Got workflow error
[error messages and tracebacks repeat from here]

Your Environment

  • cwltool version: 3.1.20240404144621

Diagnosis

This occurs because of a few related issues in the way that the expression evaluation timeout is implemented:

In general, there's a lot of code in exec_js_process that seems like it could be handled by the subprocess module directly, following the example code documented with Popen.communicate. A rough sketch would be something like:

def exec_js_process(
    self,
    js_text: str,
    timeout: float = default_timeout,
    ...,
) -> Tuple[int, str, str]:
    ...
    with subprocess.Popen(
            ["nodejs", "--eval", js_text],
            stdin=subprocess.DEVNULL,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            text=True,
            encoding='utf-8',
    ) as nodejs:
        try:
            stdoutdata, stderrdata = nodejs.communicate(timeout=timeout)
        except subprocess.TimeoutExpired:
            nodejs.kill()
            stdoutdata, stderrdata = nodejs.communicate()
            returncode = -1
        else:
            returncode = nodejs.returncode
    return returncode, stdoutdata, stderrdata

This would still need code at the top to find the name of the Node command, handle other arguments, etc. But I think this could replace all the process and timeout management code that currently lives in the method, while addressing the bugs above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant